A Neurally Controlled Computer Game Avatar With Humanlike Behavior

With Humanlike Behavior

David Gamez, Zafeirios Fountas, Member, IEEE, and Andreas K. Fidjeland, Member, IEEE

Abstract—This paper describes the NeuroBot system, which

[2]. None of these programs has passed the Turing test so far,

uses a global workspace architecture, implemented in spiking

and they often rely on conversational tricks for their apparent

neurons, to control an avatar within the Unreal Tournament

realism, for example, parrying the interlocutor with a question

2004 (UT2004) computer game. This system is designed to dis- play humanlike behavior within UT2004, which provides a good

when they cannot process or understand the input [3].

environment for comparing human and embodied AI behavior

One reason why chatterbots have failed to pass the Turing

without the cost and difficulty of full humanoid robots. Using a

test is that they are not embodied in the world. For example,

biologically inspired approach, the architecture is loosely based

a chatterbot can only respond to the question “Describe what

on theories about the high-level control circuits in the brain, and

you see on your left” if the programmers of the chatterbot put

it is the first neural implementation of a global workspace that has been embodied in a complex dynamic real-time environment.

in the answer when they created the program. This lack of em-

NeuroBot’s humanlike behavior was tested by competing in the

bodiment also leads to a constant need for dissimulation on the

2011 BotPrize competition, in which human judges play UT2004

part of the chatterbot: when asked about its last illness, it has to

and rate the humanness of other avatars that are controlled by

make something up. There is also an emerging consensus within

a human or a bot. NeuroBot came a close second, achieving a

the AI community that many of the things that we do with our

humanness rating of 36%, while the most human human reached 67%. We also developed a humanness metric that combines a

bodies, such as learning a new motor action, require and exhibit

number of statistical measures of an avatar’s behavior into a

intelligence, and most AI research now focuses on intelligent

single number. In our experiments with this metric, NeuroBot was

actions in the world, rather than conversation.

rated as 33% human, and the most human human achieved 73%.

This emphasis on embodied AI has led to a number of varia-

Index Terms—Benchmarking, botprize, competitions, computer

tions of the Turing test. In addition to its bronze medal for hu-

game, evaluation, global workspace, neural networks, serious

manlike conversation, the Loebner Prize has a gold medal for a

games, spiking neural networks, Turing test.

system that can imitate the body and voice of a human. Harnad [4] has put forward a T3 version of the Turing test, in which a system has to be indistinguishable from a human in its external

NTRODUCTION I. I

behavior for a lifetime, and a T4 version in which a machine has to be indistinguishable in both external and internal func-

N 1950, Turing [1] proposed that an imitation game could tion. The key challenge with these more advanced Turing tests

be used to answer the question whether a machine can is that it takes a tremendous amount of effort to produce and con-

think. In this game, a human judge engages in conversation trol a convincing robot body with similar degrees of freedom to with a human and a machine and tries to identify the human. the human body. Although some progress has been made with The conversation is conducted over a text-only channel, such the production of robots that closely mimic the external appear- as a computer console interface, to ensure that the voice or ance of humans [5], and people are starting to build robots that body of the machine does not influence the judge’s decision. are closely modeled on the internal structure of the human body While Turing posed his original test as a way of answering the [6], these robots are available to very few researchers and the question whether machines can think, it is now more typically problems of controlling such robots in a humanlike manner are seen as a way of evaluating the extent to which machines have only just starting to be addressed. achieved human-level intelligence.

A more realistic way of testing humanlike intelligence in an Since the Turing test was proposed, there has been a substan- approximation to an embodied setting has been proposed by

tial amount of work on programs (chatterbots) that attempt to Hingston [7], who created the BotPrize competition [8]. In this pass the test by imitating the conversational behavior of humans. competition, software programs (or bots) compete to provide Many of these compete in the annual Loebner Prize competition the most humanlike control of avatars within the Unreal Tour-

nament 2004 (UT2004) computer game environment. Human

judges play the game and rate the humanness of other avatars,

Manuscript received February 19, 2012; revised July 16, 2012; accepted Oc-

tober 24, 2012. Date of publication November 20, 2012; date of current version

which might be controlled by a human or a bot, with the most

March 13, 2013. This work was supported by the Engineering and Physical Sci-

humanlike bot winning the competition. If a bot became indis-

ences Research Council (EPSRC) under Grant EP/F033516/1. D. Gamez was with Imperial College London, London SW7 2AZ, U.K. He

tinguishable from the human players, then it might be consid-

is now with the University of Sussex, Brighton, East Sussex BN1 9RH, U.K.

ered to have human-level intelligence within the UT2004 envi-

Z. Fountas and A. K. Fidjeland are with Imperial College London, London

ronment according to Turing [1]. This type of work has prac-

SW7 2AZ, U.K. (e-mail: andreas.fidjeland@imperial.ac.uk).

tical applications, since it is more challenging and rewarding

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

to play computer games against opponents that display human-

Digital Object Identifier 10.1109/TCIAIG.2012.2228483

like behavior. Humanlike bot behavior can also be applied in

1943-068X/$31.00 © 2012 IEEE

2 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 1, MARCH 2013

agent-based simulations in economics and the social sciences, and used in film special effects to model the behavior of crowds

and background characters in a natural manner. This paper describes the NeuroBot system that we developed to produce humanlike control of an avatar within the UT2004 environment. 1 Our working hypothesis was that biologically inspired neural mechanisms could be an effective way of constructing a system with humanlike behavior. This approach enables us to learn from the neural circuits that produce our behavior, and the models developed by biologically inspired robotics can contribute to the neuroscientific understanding of the brain. The high-level coordination and control of our system is carried out using a global workspace architecture [9], a popular theory about conscious control in the brain that explains how a serial procession of conscious states could be produced by interactions between multiple parallel processes. This global workspace coordinates the interactions between specialized modules that operate in parallel and compete to gain access to the global workspace. Both the workspace and the specialized modules are implemented in biologically inspired circuits of spiking neurons, with CUDA hardware acceleration on a graphics card being used to simulate the network’s 20 000 neurons and 1.5 million connections in close to real time.

The humanness of NeuroBot’s behavior was tested by com- peting in the 2011 Botprize competition and by participating in a conference panel session on the Turing test. These events showed that human behavior within UT2004 varies consider- ably and that human judgments are not a particularly reliable way of identifying human behavior within this environment. To

explore this further, we developed a humanness metric , 2 that

combines a number of statistical measures of an avatar’s be- havior into a single number. This was used to compare Neu- roBot with human subjects as well as with the Hunter bot sup- plied with the Pogamut bot toolkit for UT2004 [10]. In the fu- ture,

could be used to discriminate between human- and bot- controlled avatars, and to evolve systems with humanlike be- havior.

The first part of this paper provides background information about the global workspace architecture and describes previous

related work on spiking neural networks, robotics, and human- like control of computer game avatars (Section II). Next, the architecture of our system is described, including the structure of the global workspace, the specialized parallel processors, and the spike conversion mechanisms (Section III). We then cover the implementation of our system (Section IV) and outline the methods that we used to evaluate the humanness of NeuroBot’s behavior (Section V). Finally, we conclude with a discussion and plans for future work.

ACKGROUND II. B

A. Global Workspace The global workspace architecture was first put forward by

Baars [9] as a model of consciousness in which separate parallel

1 A download of the latest version of NeuroBot (binary and source code) is available here: http://www.doc.ic.ac.uk/zf509/neurobot.html. This includes

a C++ wrapper for interfacing with GameBots. 2 (mannaz) means “man” in the runic alphabet.

Fig. 1. Abstract global workspace architecture. (a) Three parallel processes compete to place their information in the global workspace (the arrows indicate transmission of information from parallel processes to the global workspace). The competition between processes could be mediated by dynamically changing connection strengths from the parallel processes to the global workspace or by

a saliency value attached to each parallel process. (b) The process on the right wins and its information (represented as a triangle) is copied into the global workspace. (c) Information in the global workspace is broadcast back to the par- allel processes and the cycle of competition and broadcast begins again. More information about the abstract global workspace architecture can be found in [9] and a detailed description of a neural implementation is available in [11].

processes are coordinated to produce a single stream of control. The basic idea is that the parallel processes compete to place their information in the global workspace. The winning infor- mation is then broadcast to all of the other processes, which ini- tiates another cycle of competition and broadcast (Fig. 1). Dif- ferent types of process are specialized for different tasks, such as analyzing perceptual information or carrying out actions, and processes can also form coalitions that work toward a common goal. These mechanisms enable global workspace theory to ac- count for the ability of consciousness to handle novel situations, its serial procession of states, and the transition of information between consciousness and unconsciousness.

A number of neural models of the global workspace architec- ture have been built to investigate how it might be implemented in the brain. Dehaene et al. created a neural simulation to study how a global workspace and specialized processes interact during the Stroop task and the attentional blink [12]–[14], and a related model was constructed by Zylberberg et al. [15]. A brain-inspired cognitive architecture based on global workspace theory has been developed by Shanahan [16], which controlled a simulated wheeled robot, and Shanahan [11] built

a global workspace model using simulated spiking neurons, which showed how a biologically plausible implementation of the global workspace architecture could move through a serial progression of stable states. More recently, Shanahan [17] hypothesized that neural synchronization could be the mechanism that implements a global workspace in the brain, and he is developing neural and oscillator models of how this could work.

The global workspace architecture has also proved to be a successful way of controlling complex software systems. For example, Franklin’s [18] naval dispatching system was created to assign sailors to new billets at the end of their tour of duty. This task involves natural language conversation, interaction with databases, adherence to Navy policy and checks on job requirements, costs, and sailors’ job satisfaction. These func- tions were carried out using a large number of codelets that were specialized for different tasks and organized using a global workspace architecture. A version of this system that is ca- pable of learning has also been developed [19], [20]. Another

GAMEZ et al.: A NEURALLY CONTROLLED COMPUTER GAME AVATAR WITH HUMANLIKE BEHAVIOR 3

successful software system based on a global workspace archi- bot gets stuck [39]. Of particular relevance to NeuroBot is the tecture is CERA–CRANIUM [21], which won the BotPrize in hierarchical learning-based controller developed by van Hoorn 2010 and the humanlike bots competition at the 2011 IEEE Con- et al. [40]. In this system, the execution of different tasks is car- gress on Evolutionary Computation. CERA–CRANIUM uses ried out by specialized low-level modules, and a behavior se- two global workspaces: the first receives simple percepts, ag- lector chooses which controller is in control of the bot. Both the gregates them into more complex percepts, and passes them on specialized modules and the behavior selector are neural net- to the second workspace. In the second workspace, the com- works, which were developed using an evolutionary algorithm. plex percepts are used to generate mission percepts that set the

Related work on the evaluation of humanlike behavior in high-level goals of the system.

computer games has been carried out by Chang et al. [41], who While programmatic implementations of the global developed a series of metrics to compare human and artificial workspace architecture have been used to solve challenging agent’s behavior in the Social Ultimatum game. A trained neural problems in real time, with the exception of [16], the neural gas has been used to distinguish between human- and bot-con- implementations have typically been disembodied models that trolled avatars in UT2004 [37], and Pao et al. [42] developed a are designed to show how a global workspace architecture number of measures (including path entropy) for discriminating might be implemented in the brain. One aim of the research between human and bot paths in Quake 2. described in this paper is to close the gap between successful programmatic implementations and the models that have been

EURONAL III. N G LOBAL W ORKSPACE developed in neuroscience: using spiking neurons to implement

This section describes the global workspace architecture

a global workspace that controls a system within a challenging shown in Fig. 2, which enables NeuroBot to switch between real-time environment. Addressing this type of problem with different behaviors and deliver humanlike avatar control within

a neural architecture will also help to clarify how a global UT2004. Details about the implementation of this architecture workspace might be implemented in the brain.

in a network of spiking neurons are given in Section IV. Be- cause of space constraints, only a brief description is given of

B. Spiking Neural Networks and Robotics most of the specialized modules, with full details available in The work described in this paper takes place against a back- [43]. ground of previous research on the control of robots using bio- logically inspired neural networks. There has been a substantial

A. Sensory Input and Output

amount of research in this area, with a particularly good example Instead of the rich visual interface provided to human being the work of Krichmar et al. [22], who used a network players, bots have direct access to the avatar’s location, direc- of 90 000 rate-based neuronal units and 1.4 million connec- tion, and velocity, as well as the position of all currently visible tions to control a wheeled robot. This network included models items, players, and navigation points. These data are provided of the visual system and hippocampus, and it used learning to in an absolute 3-D coordinate frame, as shown in Fig. 3. Bots solve the water maze task. The work that has been done on the can query the spatial structure of the avatar’s immediate en- control of robots using spiking neural networks includes a net- vironment using range-finder data, and they also have access work with 18 000 neurons and 700 000 connections developed to proprioceptic data (crouching, standing, jumping, shooting) by Gamez [23], which controlled the eye movements of a vir- and interioceptic data (health, adrenaline level, armor), which tual robot and used a learned association between motor output are provided as scalar values. The avatar can be controlled and visual input to reason about future actions. In other work, by sending commands specifying its movement, rotation, Bouganis and Shanahan [24] developed a spiking network that shooting, and jumping. autonomously learned to control a robot arm after an initial pe-

To use these data within our system it is necessary to convert riod of motor babbling; there has been some research on the the data from the UT2004 environment into patterns of spikes evolution of spiking neural networks to control robots [25], and that are passed to the neural network. To begin with, a number of

a self-organizing spiking neural network for robot navigation preprocessing steps are applied to the input data. First, informa- has been developed by Nichols et al. [26]. As far as we are tion about an enemy player or other object is converted into a di- aware, none of the previous work in this area has used a global rection and distance measurement that is relative to the avatar’s workspace implemented in spiking neurons to control a com- location. Second, a measure of pain is computed as the rate of puter game avatar.

decrease of health plus the number of hit points. Third, the ve- locity of the avatar is calculated and a series of predefined cast

C. Humanlike Control of Computer Game Avatars rays are converted into a simulated range-finder sensor. Fourth,

A variety of approaches have been used to produce humanlike each piece of information is transformed into a number in the behavior in computer games, such as UT2004 and Quake. For range 0–1 that is multiplied by a weight. example, a substantial amount of research has been carried out

The spike conversion of the preprocessed data builds on pre- on evolutionary approaches [27]–[32] and many groups have vious work in this area [44], [45], and it is carried out using used machine learning to copy human behavior [33]–[37]. Other either a rate or a population code. Simple scalar values, such as approaches include a self-organizing neural network with rein- whether shooting is taking place or the current health level, are forcement learning developed by Wang et al. [38], and UT ’s converted into a mean firing rate (spikes/millisecond), which is hybrid system, which uses an evolved neural network to control used to deliver spikes governed by a Poisson distribution to neu- combat behavior and replays prerecorded human paths when the rons representing the corresponding sensory data. When more

4 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 1, MARCH 2013

Fig. 2. NeuroBot’s global workspace architecture.

temporal or spatial precision is required, a population of neu- with the action selection module, it ensures that only a single rons is used to represent the encoded value. For example, spa- behavior module, or a coherent coalition of behavior mod- tial data about the avatar’s environment are represented using an ules, is in control of the motor output. Based on the workspace egocentric 1-D neuron population vector in which each neuron model in [11], the architecture consists of four layers of neurons represents a direction in a 360 view around the avatar, with the linked with topographic excitatory connections (to preserve directions expressed relative to the current view direction, and patterns of activation) and diffuse inhibitory connections (to the salience indicated by the firing rate [Fig. 4(a)].

decay patterns of activation over time). The information in Two different approaches were used to decode the spiking each layer is identical, and the four layers are used as a kind output from the network into scalar motor commands. Simple of working memory that enables information to reverberate commands, such as shoot or jump, were extracted by calculating while the modules compete to broadcast their information. the average firing rate of the corresponding neurons over a small Each workspace layer is divided into areas that represent four time window. The angles specifying motion and view directions different types of globally broadcast information: 1) desired were encoded by populations of neurons and extracted in the body state (the output data that are sent to control the next following way.

position of the avatar); 2) proprioceptive data (the avatar’s

1) Find all contiguous groups of neurons of the population current position and movements); 3) exteroceptive data (the vector , where the firing rate over a small recent time state of the observable environment); and 4) interoceptive data window of each neuron within the group is above a (the internal state of the avatar). threshold (half the overall average firing rate).

2) For each such group , create a new population vector that inherits only the firing rates of neurons within the con- C. Behaviors

tiguous group with rates above the threshold. The behavior of NeuroBot is determined collectively by a

3) Calculate the second-order polynomial regression of number of specialized modules, which read sensory data from each .

the global workspace and compete, either individually or as

4) Calculate the angle where part of a coalition, to control the avatar. The competition is carried out by comparing the activity of the salience layers in each module, with the most salient module winning access to

An illustration of this algorithm is given in Fig. 4(b), which the workspace and being executed. The action selection module shows significant activity in four different areas of the popula- ensures that contradictory commands, such as jumping and not

tion vector encoding the desired motion direction. This method jumping, cannot be carried out. The design of many of the mod- enables our system to accurately identify the center of the most ules was inspired by Braitenberg vehicles [46], which use pat- active group of motor output neurons, implementing a winner- terns of excitatory and inhibitory wiring to produce attraction takes-all mechanism that ignores small activity peaks in the rest and repulsion behaviors that have good tolerance to noise. The of the population vector.

following paragraphs outline NeuroBot’s current behaviors.

a) Moving: This determines whether the avatar moves or

B. Global Workspace remains stationary. Its salience is never sufficient for the module The global workspace (Fig. 2) facilitates cooperation and to win access to the workspace on its own, since it only gener-

competition between the behavior modules. In combination ates a desire to move without providing any direction for the

GAMEZ et al.: A NEURALLY CONTROLLED COMPUTER GAME AVATAR WITH HUMANLIKE BEHAVIOR 5

Fig. 3. Game environment. On the left is a map showing the data that are available to bots through the GameBots 2004 server (see Section IV-B. This includes the location of the avatar (black circle), other players (red circles), navigation points in view (cyan squares), and out of view (green squares), and wall locations gathered from ray tracing. On the right is the in-game view that is displayed to human players.

movement. This module contains a layer of neurons with a low fleeing player and influences the desire to look in the direction of level of activity, which send spikes to the salience layer.

this player. The aggregate behavior depends on the salience of

b) Navigation: This module (Fig. 5) controls the avatar’s the moving and firing modules. For example, when this module direction of motion, enabling it to follow walls and avoid colli- is cooperating with the moving module, the avatar will start sions. A coalition between the moving and navigation modules chasing the enemy. On the other hand, if the view direction is can cause the avatar to move in a particular direction.

aligned with the direction of the enemy, then the salience of the

c) Exploration: In situations without any observable ene- firing module will increase and the avatar will start shooting. mies or other interesting items, this module enables the avatar to

g) Fleeing: The fleeing module (Fig. 6) increases in exhibit exploratory behavior. The architecture here is similar to salience when an enemy is near and the level of health is low. the navigation module, except that the activation of its salience When it is active, it drives the motion direction in the opposite layer is proportional to the activation of neurons representing direction to the source of immediate danger. the level of health, which avoids exploratory behaviors when

h) Recuperation: The recuperation module activates when the avatar’s health is low.

health is low and the location of health vials is known. When it

d) Firing: This module enables the avatar to attack or de- is active, this module drives the view and motion direction to- fend itself when threatened by enemies. Using similar mecha- ward observed health vials. This module usually forms a coali- nisms to the previous cases, the salience layer is activated only tion with the fleeing and moving modules, resulting in an active when the observed direction of the enemy is aligned with the fleeing behavior that looks for health vials while it tries to avoid current view. To achieve this, only the central neurons of the the attacking enemy. egocentric neuron population vector representing the direction

i) Item Collection: This controls the approach and col- of the enemy are connected to the inhibitory layer, and the ac- lection of items. When an item is noticed, the salience layer tivation of the salience layer is proportional to the enemy prox- becomes active and the module aligns the avatar’s view direc- imity and the level of health. Hence, if an enemy is close and tion with the item. When this module forms a coalition with the the avatar has sufficient health, it will fire.

moving and navigation modules, the avatar will approach and

e) Jumping: The salience of this module increases when collect the target item. the moving module is active and there is low horizontal ve-

j) U-Turn: This module enables the avatar to escape from locity, causing the avatar to jump over an obstacle or escape situations in which it is surrounded by walls and cannot decide from a situation in which it is stuck. This module also increases which direction to turn. This usually means that the avatar is in salience when an approaching enemy is observed. In this sit- trapped and an obvious approach is to reverse the direction of uation, jumping makes it harder for the enemy to hit the avatar. motion. This module is also activated when the avatar is being

f) Chasing: This module causes the avatar to follow en- attacked by an unseen enemy. emies that try to flee, for example, due to falling health. The

salience layer is activated when the avatar’s own health is good

D. Action Selection

and an enemy is nearby or moving away, at which point the The action selection module facilitates the competition be- chasing module drives the desired direction of view toward the tween behavior modules for access to the global workspace: se-

6 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 1, MARCH 2013

Fig. 4. Encoding/decoding of spatial data using an egocentric 1-D population code. A vector of neurons represents a range of spatial directions with respect to the current direction of view. (a) Example of how firing activity in a neuron population encodes the presence of a wall to the right, where firing activity is inversely proportional to distance. (b) Example of decoding the value that represents the desired motion direction.

Fig. 5. Navigation module. Workspace neurons representing range finder measurements are topographically connected to an excitatory layer A , which passes the pattern to the desired motion direction. This causes the avatar to

Fig. 6. Fleeing module. The enemy direction is diametrically connected to a move in the direction where the most empty space has been observed. The

layer of excitatory neurons A , which sends back to the workspace node the same workspace neurons also excite the inhibitory layer A1 , which inhibits

direction that the avatar should point in order to flee from the enemy. The enemy A . Thus, in a situation where the salience layer is not active, A is not able

direction also inhibits the A layer through the inhibitory layer A1 to disable to send spikes back to the workspace node. The salience layer is activated

the broadcast of this vector in cases where the salience layer is not active. The when the moving module is also active, when it sends spikes to the inhibitory

level of activation of the salience layer also depends on enemy proximity and layer A2 , which then inhibits A1 . As a result, A1 becomes unable to stop

the level of health.

A from sending the desired motion direction back to the workspace, and the entire module becomes active.

neurons also receive connections from the inhibitory layer, as shown in Fig. 7.

lecting modules based on the activity in their salience layers and preventing contradictory actions from being implemented

MPLEMENTATION IV. I simultaneously. The internal structure of this module consists of

The neuronal global workspace described in Section III was an excitatory input layer and an inhibitory layer of the same size. implemented using a spiking neural network that was modeled The input layer is connected to the inhibitory layer, and it re- with the NeMo simulator. In the current system, the connec- ceives connections from the workspace neurons that hold infor- tion patterns and weights were manually engineered to produce mation about the desired state of the system. These workspace the desired behaviors; some suggestions about how the system

GAMEZ et al.: A NEURALLY CONTROLLED COMPUTER GAME AVATAR WITH HUMANLIKE BEHAVIOR 7

Fig. 7. Action selection module.

could learn are given in Section VI. A custom C++ interface was created to communicate with the GameBots 2004 server.

A. Spiking Neural Network Spiking neural networks were selected for this system be-

cause they have a number of advantages over rate-based models [47], [48], including their efficient representation of informa-

tion in the form of a temporal code [49] and their ability to use local spike-time-dependent plasticity learning rules to iden- Fig. 8. System overview. tify relationships between events [50]. It is also being increas- ingly recognized that precise spike timing plays an important and similar games since the required parallel hardware is role in the biological brain [51]–[53], and the implementation typically already present on the player’s computer. 3 in our system of spiking neurons enabled us to take advantage

NeMo updates the state of the neural network in a discrete- of high-performance CUDA-accelerated spiking neural simula- time simulation with a 1-ms time step. While NeMo can sim-

tors that can model large numbers of neurons and connections ulate the NeuroBot network of 20 000 neurons and 1.5 million in real time [54], [55]. The feasibility of this approach has been connections in real time on a high-end consumer graphics card, demonstrated by other global workspace implementations based the NeuroBot network typically ran using a 5-ms time step, ef- on spiking neurons [11], [13]–[15], which provided a source of fectively simulating the network at a fifth of the normal speed. inspiration for the current system.

This was mainly due to the computing requirements of its other The neurons in the network were simulated using the Izhike- components. The parameters of the network were adjusted to

vich model [56], which accurately reproduces biological neu- produce the desired behavior at this update rate. rons’ spiking response without modeling the ionic flows in de-

B. UT2004 Interface

tail. This model is computationally cheap and suitable for large- scale network implementations [57], [58]. The parameters in

Our system uses sockets to communicate with the GameBots Izhikevich’s model can be set to reproduce the behavior of dif- 2004 server [60], which interfaces with UT2004 (Fig. 8). In

ferent types of neurons, which vary in terms of their response the case of a bot connection, GameBots sends synchronous and characteristics. In our simulation, we used two types of neu- asynchronous messages containing information about the state rons, excitatory and inhibitory, which have different behaviors of the world, and it receives commands that control the avatar and connection characteristics.

in its environment. To facilitate this communication, we created The spiking neural network was simulated using the NeMo

a C++ wrapper for GameBots 2004, which contains a syntactic spiking neural network simulator [54], [59], which was de-

3 To check that UT2004 and NeuroBot could run smoothly on the same com-

veloped by some of the authors of this paper and is primarily puter we measured the extra computational load of NeuroBot on a computer with intended for computational neuroscience research. NeMo is a an Intel Core 2 Quad Q6600 (2.40 GHz) processor, 8.00-GB RAM, GeForce

high-performance simulator that runs parallel simulations on GT 240, and CUDA support. Without NeuroBot, UT2004 was running at the

maximum rate of 90 frames/s. When NeuroBot joined the game, the frames per

graphics processing units (GPUs), which provide a high level of second reduced to 87, and the average central processing unit (CPU) usage in- parallelism and very high memory bandwidth. The use of GPUs creased from 5% to 13%. On a second computer with an Intel Core i7-2600 (3.40

makes the simulator eminently suitable for implementing neu- GHz) processor, 8.00-GB RAM, and no CUDA support, NeuroBot reduced the

frames per second of the UT2004 client from 89 to 85 and the average CPU

rally based cognitive game characters in first-person shooters usage increased from 10% to 20%.

8 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 1, MARCH 2013

Fig. 9. Results from Botprize 2011. (a) First trial (August 10, 2011). (b) Second trial (August 17, 2011). (c) Final (September 3, 2011).

Fig. 10. Pairwise similarity between five human subjects (h1–5), the average of the human subjects (h-all), NeuroBot (NB), and Hunter (Htr) for the different statistical measures. The numbers are the -values that were computed from the data recorded across the runs. A -value greater than 0.05 indicates that there is no statistically significant difference between the two sets of measurements.

analyzer that parses the GameBots messages and updates the Botprize competition consisted of two trial rounds, which were attributes of the corresponding objects that form the system’s mainly intended to establish that everything was working as ex- representation of the world. The system then converts the gath- pected, and the actual competition. We have included the results ered information into spikes that are passed into the network, from the trials as well as the final competition to give a better and spikes from the network are converted back into scalar vales idea about the variability. that are sent to the GameBots server. The communication loop

The results shown in Fig. 9 indicate that NeuroBot was a with GameBots is asynchronous and runs independently of the highly ranked competitor, achieving first or second place in the neural network update.

trials and a close second in the final. In the last round of the final, we experienced significant technical problems caused by

EASUREMENTS OF V. M N EURO B OT ’ S H UMANLIKE B EHAVIOR

a communication delay between our laptop running NeuroBot at the competition in Seoul and the UT2004 server in Europe (a

The humanness of NeuroBot’s behavior was evaluated by problem that some of the other teams addressed by running their competing in the 2011 Botprize competition, by participating in bots from machines closer to the server). This problem could

a conference panel session, and statistical measures were used only be partially addressed by running NeuroBot on the Bot- to compare the behavior of avatars controlled by humans and prize server, which substantially compromised its ability to run NeuroBot.

in close to real time and produce humanlike behavior. Without

A. Human Judgements About NeuroBot’s Behavior this issue, NeuroBot might have achieved a significantly higher

humanness rating—something that we intend to test in future The humanlike behavior of NeuroBot was first evaluated by competitions. The Botprize results also showed that the Epic

competing in the 2011 Botprize competition [8]. In this compe- bots shipped with UT2004 performed slightly better than the tition, human judges play the game and rate the humanness of competitors and that the human players were rarely rated as other avatars, which might be controlled by a human or a bot, 100% human: the average humanness of the humans varied with the most humanlike bot winning the competition. The 2011 from 50% to 70%, with some rated as low as 20%.

GAMEZ et al.: A NEURALLY CONTROLLED COMPUTER GAME AVATAR WITH HUMANLIKE BEHAVIOR 9

We also carried out an informal evaluation of NeuroBot at closest to each navigation point. EF is then averaged over the 2011 SGAI International Conference on Artificial Intelli-

a number of time periods as follows: gence as part of a panel session on the Turing test. An audi- ence of around 50 conference delegates was shown an avatar

EF

controlled by NeuroBot on one randomly selected screen and an avatar controlled by a human on another randomly selected screen. At the end of each 5-min round the audience was asked

is the number of different navigation to vote on which of the avatars was controlled by a human and

where

areas visited in each time period; is the which was controlled by a bot and to explain the reasons for

number of transitions between navigation areas in each their choice. In the first round, one avatar was controlled by an time period (this is incremented by 1 to ensure that EF lies

amateur human player, and the majority of the audience incor-

is the number of different rectly believed that the NeuroBot avatar was being controlled by

between 0 and 1);

navigation areas visited in the entire testing period;

a human. In the second round, the other avatar was controlled is the total number of navigation areas in the map, which by a skilled player and the majority of the audience correctly

is the same as the number of navigation points; and is identified the human-controlled avatar. In the second case, many the number of time periods over which EF is calculated. In

audience members reported that their decision was based on in- our experiments, each time period was 1 min in duration. formation supplied by the chair that the human player in the

EF has a minimum of 0 and a maximum of 1. If a player second round was highly skilled, which led them to select the

visits all of the navigation areas once, its exploration factor most competent avatar. One audience member reported that his

will be 1, which means that the avatar has followed the best correct identification of the human player had been based on possible path to explore the map. Contrariwise, if the avatar

minor glitches in NeuroBot’s behavior. alternates between two navigation areas in a much larger These evaluations suggest that humans often make incorrect

map, then the total number of visited navigation areas will judgements about whether an avatar is being controlled by a

be 2 and the exploration factor will be close to 0. If the human or a bot and they are often influenced by the skill level player visits more navigation areas, but most of the time

of the player, which is highly variable between humans. In the alternates between two navigation areas, then the average Botprize competition, there is a further problem that the humans

over the time periods will tend toward 0, and the explo- are carrying out a double role in which they are supposed to be

ration factor will again be close to 0. playing the game as normal players while they make judgements

• Stationary time: It is likely that human-controlled avatars about the behavior of the bots. These roles are to some extent

will exhibit characteristic patterns of resting and movement contradictory since accurate judgements are likely to depend on

that will differentiate them from bots. The total amount of close observation of other players from a relatively safe place in

time that the avatar was stationary during each round was

a static position. This will tend to reduce the number of judge- recorded to determine if this is a useful measure. ments that are made about judges’ humanness, and make human

• Path entropy: It is likely that human players will exhibit more players behave differently from human judges. 4 variabilityintheirmovements,beingattractedtowardsalient

features of their environment, such as doors, staircases, or objects, whereas bots are more likely to exhibit stereotyped

B. Statistical Measures of Humanlike Behavior movements that are linked to the game’s navigation points. The variability of human judgements and human behavior led

Path entropy was measured by dividing the map into squares us to develop a number of statistical measures of avatars’ be-

and recording the probability that the avatar would be in each havior in UT2004, which were used to compare the behavior of

square during each round [42]. The information entropy was NeuroBot with humans and to explore the behavior differences

then calculated using the standard formula between human players.

• Exploration factor (EF): While humans smoothly navigate around UT2004 using the rich visual interface, avatars guided by range-finder data are likely to get stuck on walls

• Average health, number of kills, number of deaths: The av- and corners and fail to explore large areas of the game

erage health together with the number of kills can be loosely environment, for example, a wall-following bot might

interpreted as the amount of positive rewards that a player endlessly circle a block or column. The EF measure was

has received from their environment, with the number of developed to quantify the extent to which an avatar visits

deaths being the amount of negative rewards. The balance all of the available space in its environment.

between positive and negative rewards is a measure of a

A GameBots map in UT2004 has predefined navigation player’s skill, which has been linked with intelligence in points that are spread throughout the accessible areas

some of the universal intelligence measures that have been and placed in strategic positions to help simple software

put forward [61], [62]. While humans display a considerable bots navigate. The first stage in the calculation of EF is

range of skill within UT2004, there is likely to be a maximum

a Voronoi tessellation of the map, which divides it into limit, which could only be exceeded by a bot with faster re- navigation areas, which contain the points that are

actions, greater precision, and access to better data. A basic

4 Perhaps this is why the winning ICE–CIG2011 bot was based on human

bot that was often trapped in corners would be likely to ex-

judges, not human players [37].

hibit less skill than the majority of humans.

10 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 5, NO. 1, MARCH 2013

TABLE I

P ERCENTAGE OF S TATISTICALLY S IGNIFICANT M ATCHES W ITH H UMAN B EHAVIOR FOR E ACH H UMAN P LAYER AND FOR N EURO B OT AND H UNTER ,S HOWN FOR E ACH M EASURE .T HE H UMANNESS

M ETRIC

I S THE A VERAGE T AKEN O VER A LL M EASURES

The first thing to note about the results shown in Fig. 10 is that human behavior is not a single unitary phenomenon; the move-

ment patterns, health, kills, and deaths of the human-controlled avatars all varied significantly between players. In particular,

h4 displayed a large amount of anomalous behavior compared with the rest of the humans and the bots, probably because she or he was the least skilled player. Some more specific comments about the results are as follows.

• EF: NeuroBot matched most of the human players on this measure, which suggests that it explores the game envi- ronment in a reasonably humanlike manner. Interestingly, both Hunter and h4 had higher exploration factors than NeuroBot and the other humans, which suggests that com- plete systematic exploration of an environment is not a typ- ical human characteristic.

• Stationary time: NeuroBot matched a majority of humans on this measure, which exhibited a substantial amount of variability between humans. Hunter had a poor match to the human data because it was stationary for approximately four times longer than most human players.

• Path entropy: Hunter typically had a lower path en-

Fig. 11. Typical paths recorded from (a) human, (b) NeuroBot, and (c) Hunter

tropy than the humans, whereas NeuroBot’s path entropy

as well as the duration and location of each stop.

matched the majority of human players. These differences in path entropy are clearly visible in the path traces shown

A number of other statistical measures were tried, which

in Fig. 11.

there was not space to present in this paper. These included av- • Average health: Hunter appears to conserve its health in a erage enemy distance, number of jumps, average wall distance,

more humanlike way than NeuroBot, which had a higher number of items taken, and shooting time. Visualization of the

average value than the human players. avatars’ paths and their stopping time was also used to compare

• Number of kills: NeuroBot and the human players killed a the behavior of human- and NeuroBot-controlled avatars.

similar number of other players, whereas Hunter typically To test these measures, we recorded data from avatars

killed two to three times more players. In this case, playing controlled by humans or bots within the UT2004 map shown

the game too well led to behavior that was nonhuman in in Fig. 11. Ten 5-min rounds were used to collect the data

character.

from each human player (h1–h5) and data from 35-min rounds • Number of deaths: NeuroBot typically died more than the was recorded for NeuroBot and the Hunter bot included with

human players, whereas Hunter exhibited fairly humanlike Pogamut. In each round either a human or one of the bots

behavior, presumably because it could preserve its health played against the Epic bot that is shipped with UT2004. 5 better and was more effective at killing other players.

Pairwise similarity between the data gathered from human and The paths and stopping times of NeuroBot, Hunter, and a bot players was then computed using a standard two tail t-test human player are plotted in Fig. 11. These show that NeuroBot’s for unequal variances.

movements are more qualitatively similar to a human’s than

Hunter’s, which had very tightly constrained behavior. This also

While the Epic bot that is included with UT2004 would have been a better target for comparison with NeuroBot, this was not possible in practice because

explains why Hunter performs badly on the EF, path entropy,

of the limited amount of data that could be recorded from the Epic bot.

and stationary time measures.

GAMEZ et al.: A NEURALLY CONTROLLED COMPUTER GAME AVATAR WITH HUMANLIKE BEHAVIOR 11

To get an overall idea of the extent to which NeuroBot and in Section V-B could also be used to address this issue by gath- Hunter exhibited humanlike behavior within UT2004, we de- ering a much larger sample of behavior from humans and clas- veloped a humanness metric (mannaz meaning “man” in the sifying it according to different skill levels (perhaps using data runic alphabet), which measures the extent to which the behavior about average health, deaths, and kills). Comparison between of an avatar matches the behavior of a population of humans, bots and humans with similar levels of skill would be likely to using a particular set of statistical measures. For a bot-controlled eliminate much of the variability. It would then be possible to avatar,

is the average percentage of statistically significant calculate the humanness metric of a bot with a particular skill matches between the bot and the sampled human players. for level.

could then be used in a genetic algorithm to evolve a

a human-controlled avatar is the average percentage of statisti- bot with a particular level of skill, and it could also be used to cally significant matches between the human and all of the sam- discriminate between human and software players, perhaps in

pled human players, this time including the human that is being situations where only humans are supposed to be competing. measured, who is an example of a human player.

While humanlike behavior can be a valuable attribute in a The results shown in Table I confirm the observation dis- computer game avatar, it is not necessarily an ideal benchmark