A Computational Model of the Hybrid Bio Machine MPMS for Ratbots Navigation

S CU YSBT OA RI GN AI BN LT IE TL YL I G E N C E

A Computational
Model of the Hybrid
Bio-Machine MPMS
for Ratbots Navigation
Lijuan Su, Nenggan Zheng, Min Yao, and Zhaohui Wu, Zhejiang University

T

he biological brain is the most sophisticated, efficient, parallel, and lowenergy consumption system to exhibit advanced cognitive functions.

As the technology behind brain-computer interfaces becomes more and more
As a typical cyborg

mature, it provides new ways to realize the direct connection between man-made
intelligent system,
ratbots possess
not only their own
biological brain
but machine visual

sensation, memory,
and computation.
They could help
us understand
the memory and
learning mechanisms
of cyborg intelligent
systems.
NOVEMBER/DECEMBER 2014

devices (artificial intelligence) and biological brains (biological intelligence). Researchers have proposed a concept and architecture
for cyborg intelligence that integrates biological intelligence with artificial intelligence.1
Through complementary integration, the
cyborg intelligence can solve complicated
problems in general environments, which
neither biological nor artificial intelligence
system can tackle alone.
A typical cyborg intelligence system is the
ratbot, which is a rat with electrodes implanted in the medial forebrain bundle of its
brain.2,3 The electrodes are connected to an

embedded backpack fixed on the rat to deliver
the stimulation pulses that give a reward according to rat behavior. The real-time reward
related to the rat’s transient action can affect
the animal’s learning and memory processes.
In a goal-oriented task, for example, ratbots
can reach the goal faster and learn the optimal
path in fewer trials.
The fantastic performance exhibited by
ratbots in maze-learning tasks comes from

their novel memory and learning system,
which is built up by introducing a real-time
medial forebrain bundle (MFB) reward into
the existing multiple parallel memory system in their biological brains. However,
there’s a research gap between the computational model and experimental results.
We aim to create the computational process
for explaining the mechanisms underlying a
ratbot’s superior learning performance and
figuring out how the input information is
processed to generate behavior.


Multiple Parallel Memory
Systems in Ratbot Navigation
Neurobiological results suggest that the
memory system in rat brains is composed
of several distinct anatomically and functionally dissociable subsystems.4 The current classical version of the multiple parallel
memory systems (MPMSs) theory hypothesizes three central structures: the hippocampus, the dorsal striatum, and the amygdala.5
The respective neural circuits of these three

1541-1672/14/$31.00 © 2014 IEEE
Published by the IEEE Computer Society

5

CYBORG INTELLIGENCE

Computer

Hippocampus


Amygdala

Striatum
Distance to walls

State and reward
Motor

Landmark

Environment

Figure 1. Concept of the hybrid bio-machine multiple parallel memory system (MPMS).

structures encode and process specialized memory information and generate rat behavior to the outside world.
The hippocampus subsystem is assumed to represent the relationship
between stimuli and events from
physical environments. For example,
a particular location can be stored
by place cells; when the rat passes

through this place, the corresponding
place cells fire significant bursts of action potentials.6
The second subsystem, with the dorsal striatum as its central structure,
represents stimulus-response (S-R) relationships.5 When first faced with an
environmental stimulus, animals might
respond accidentally or instinctually.
But if they encounter a reinforcer, the
association between the stimulus and
accidental response will be strengthened. The next time the animal is in
the presence of the same stimulus, it
will be more likely to exhibit the same
response.
The relationship between neutral
stimuli and reinforcers is represented
in the third subsystem of the amygdala’s central structure. Associated
with the reinforcer, a neutral stimulus
can also evoke conditioned responses
similar to those initially elicited by the
reinforcer.
The hybrid bio-machine MPMS

in ratbots is constructed from the
6

combination of the aforementioned
neural circuits and the computer-delivered MFB reward loop. As Figure
1 shows, the physical relationships
among environmental cues (barriers
and walls in the maze) in the hippocampus subsystem form spatial maps
that direct ratbots’ ongoing navigation behavior. In behavioral experiments, the reward association
between a landmark and the right
choice in the maze is processed by the
striatum subsystem to make the ratbot
select the correct route choice more
likely as the training time increases.
By introducing computer-controlled
MFB rewards into ratbots, the reinforcer distribution in the maze learned
by the computer guides the rat to for
search higher reinforcers with stronger MFB stimuli, represented by the
amygdala subsystem in the rat’s biological brain and the MFB reinforcer
distribution map in the computer.


Computational Model of the
Hybrid Bio-Machine MPMS
in Ratbots
Figure 2 depicts the computational
model of a ratbot’s MPMS. By introducing real-time MFB stimulation
into the rat brain, the new hybrid
bio-machine MPMS encodes environment sensation inputs as neural representation in the rat’s brain or a map
www.computer.org/intelligent

in the computer, integrates the representation into various associations,
and then generates the motor selection. Three subsystems complete the
environmental information encoding,
memory association forming, and motor output.
The hippocampus subsystem is responsible for processing allothetic
and idiothetic information.7 Allothetic
place cells (APCs) encode distances
to walls and barriers, while idiothetic place cells (IPCs) represent the
current position on the inputs of the
rat’s movement speed and direction.

Hippocampus place cells (HPCs) integrate these two pathways into spatial memory by associating related
representations.
The dorsal striatum subsystem encodes the relationships between landmarks and actions. The position of
landmarks in the environment is represented by landmark cells (LCs).8
Given a stimuli of a landmark in a
maze, the rat is rewarded for its response of a specific movement action. The strengthened associations
between the landmark and the corresponding action are represented by
larger interconnection weights between dorsal striatum cells (DSCs)
and action cells (ACs) in the computational model.
In the third subsystem, the computer
(that is, the backpack) senses environment cues in the physical world to recognize the rat’s current position and
movement actions. The computer also
learns a reward map (position s, reward r) for the experiment scenario
using a Q learning algorithm iteratively along with the rat’s navigation
behavior in the maze. For each rat
state at position s, the computer delivers the real-time electric stimulus r
to the MFB of the rat’s brain according to the reward map. The real-time
virtual MFB reward affects the amygdala to update the connection strength
IEEE INTELLIGENT SYSTEMS


between spatial memory (output from
the hippocampus subsystem and dorsal
striatum subsystems) and reward prediction in the rat brain. When selecting
the actions (AC in Figure 2), the rat will
prefer the movement action with maximum real-time reward prediction and
run in the corresponding direction. The
computer will recognize this movement
action as input for the successive iterations of real-time MFB rewards.
Computational Model for the
Hippocampus Subsystem

The hippocampus subsystem is constructed as a connectionist model
(see Figure 3), 4 with different inputs of space information encoded.
In the allothetic pathway, distances
to the maze walls are represented by
APCs, whereas the proprioceptive
inputs (rat’s own speed and head
direction) are processed by IPCs. 8
Both APCs and IPCs are projected
to HPCs for integrating their representations into a spatial memory of

the current position. HPC activity is
linked to the input of ACs to generate motor output.
Allothetic information about environment geometric properties plays
an important role in hippocampal
spatial representation. The environment’s geometric information is encoded as the distances from the rat’s
current position to the surrounding
maze walls (d1, d2,…, dNA), in NA directions angles (j1, j2,…, jNA). The
firing rate of the APC j is computed as

a APC
j

 1

= exp  − N



(


∑ k=1(dkj − dk )
NA

2
2σ APC

)

2


 , (1)



where d1j , d2j ,..., d(jN ) are the distance
A
fields in NA directions for an APC cell
j, and sAPC is the width of the APCs.
In the idiothetic pathway, suppose
the agent (that is, the rat or ratbot) is
moving at speed v(t) in the direction
angle b(t), and its current position
NOVEMBER/DECEMBER 2014

Environment

d (walls)

Sensation
encoding

APC

d (L),θ(L)

S, r

QL

LC

IPC

MFB

Memory
formation

HPC

DSC

AMC

AC

Action output

Environment

Figure 2. Computational model of the hybrid bio-machine MPMS. AC = action cells;
AMC = amygdala cells; APC = allothetic place cells; DSC = dorsal striatum cells;
HPC = hippocampus place cells; IPC = idiothetic place cells; LC = landmark cells;
MFB = medial forebrain bundle; and QL = Q learning algorithm.

pt(xt, yt) is computed by the previous
position p(t - 1),
x ( t ) = x ( t − 1) +

∫ t −1v ( µ ) * cos ( β ( µ )) d µ
t

APC and IPC populations, the HPC
populations represent the current location. For an HPC cell indexed m in
the population, we calculate

(2)

y ( t ) = y ( t − 1) +



(

)

t
v ( µ ) *sin β ( µ ) d µ.
t −1

(3)

The activity of IPC i is computed as
follows:

aiIPC

(

 p t −p
() i
= exp  −
2

2σ IPC


)2  ,



(4)

where p(t) is the agent’s current position, pi is the center field of IPC i, and
sIPC is the width of the IPCs.
With inputs from two different representations of spatial information in
www.computer.org/intelligent

HPC
am
=

∑ Ni −1 wmi ∗ aiAPC
N
+ ∑ j −1 wmj ∗ aIPC
,
j
APC

IPC

(5)

where aiAPC and aIPC
are the activity
j
of the APC cell i and IPC cell j. The
weights wmi, wmj between HPC m and
APC i, IPC j are updated using the
Hebbian learning algorithm,9

(
*(a

)
),

HPC
∆wmi = µ * am
* aiAPC − ∆wmi

(6)

HPC
∆wmj = µ * am

(7)

IPC
j

− ∆wmj

where m is the learning rate, and aiAPC ,
HPC
aIPC
, and am
are the activities of
j
7

CYBORG INTELLIGENCE

Allothetic pathway

Idiothetic pathway

(d1,..., dn)

smallest Q(s, a) was mapped to the
SIlow, whose value was 10; the largest Q(s, a) was mapped to the SIhigh;
whose value was 70; and the other
Q(s, a) was computed using the a linear interpolation.10

(v, theta)



APC



Action Selection for Behavior
Outputs by TD Learning

IPC

A population of ACs sums up the
projections of APCs and DSCs to determine the agent’s movement direction angle f:
HPC

alAC =



∑ iN=1 wli * aiHPC ,
N
+ ∑ j =1 wlj * aDSC
j
HPC

(10)

DSC

Figure 3. Computational model for the hippocampus subsystem in the MPMS.

APC cell i, IPC cell j, and HPC cell
m, respectively.
Computational Model for the
Striatum Subsystem

The striatum subsystem receives the
landmark information and associates
it with a specific action when the landmark and the reward appear in pairs,
as Figure 4 shows. The landmark information is processed by a population
of LCs that correspond to directions
in ϕ1, ϕ 2 ,…, ϕ N LC , ϕ i = 2 * π * i / N LC .
If a landmark is detected in direction
ji, the activity of LC j in that direction
is 1, that is, Iij = 1. The firing rate of
the DSC populations is computed as

(

aDSC
j

)


1
 N
= exp  − LC



∑ k=1 ( IkL − Ik )
N LC

2
2σ DSC

2


,



(8)
L
where (I1L ,  I2L ,…, I N
) are the repLC
resentation of landmark in N LC directions for the DSCs, and Ik is the
current view in DSCs.

8

Computational Model for the
Amygdala Subsystem

With inputs of rat action a and current state s captured by the computer’s camera at time t, the environment
Q value Q(St, at) is calculated by a Q
learning algorithm in Figure 5 running on the machine as follows:
Q(st, at) = Q(st, at) + a(rt + b *
max(Q(st + 1, at + 1)) - Q(st, at,)),

(9)

where r t is the reward, a is the learning rate, and b is the discount factor
that determines the current value for
future rewards. For state st, the computer calculates the Q(st, at) and delivers the MFB stimulus intensity SI
mapping from the Q(st, at).
In the behavioral experiments, to
ensure that the ratbots can distinguish
different intensities of MFB stimulation, the number of the MFB stimulation was set to 7 according to our
preliminary experiments. Similar to
the behavioral experiments, the computational model mapped the Q(s, a)
to seven groups of SI intensities. The
www.computer.org/intelligent

where alAC denotes the activity of action cell l, l = 1,..., N AC , aiHPC and aDSC
j
are the activities of HPC i and DSC j,
which can be calculated in Equations 5
and 8. The weights wli and wlj between
AC l and HPC i, DSC j are updated by
the temporal difference (TD) learning
algorithm,11 depicted in Figure 6.
The state-action value Q(s, f) is
defined as state s encoded by the activity of HPCs, DSCs, and the action
determined by the movement angle
f. The activity of AC l represents the
movement activity in direction j l =
2p/N AC * l. We get the movement
direction f of the behavior output encoded by the AC populations from
the following equation:

∅ = arctan 


∑ i =1 aiAC * sin (ϕ i )  .
N

∑ i =1 aiAC * cos (ϕ i ) 
N AC
AC

(11)

Evaluation of the
Bio-Machine MPMS
Computational Model
The task in the biological experiments is to reach the goal location in
the maze shown in Figure 7, where S
is the start location, G is the goal location, the yellow circle is the decision point, and the purple star is the
IEEE INTELLIGENT SYSTEMS

landmark. The landmark is a cuboid
glass box with colors whose size is
6 cm × 6 cm × 1 cm. In the experiments, the landmark box can be put
into or removed from the maze, the
size of which is 150 cm × 150 cm ×
15 cm. For each trial, the agent begins at the start point and finishes at
the goal location. To reach the goal
location, the agent must make six decisions from two choices each. One
choice leads to the target location and
the other leads to a dead end where the
agent can only go back to the previous
location. The correct directions in different mazes can be changed to conduct different experiments. Figure 7 is
maze A, whose correct directions are
RLLRLR (R means right, L means left);
correct directions for maze B are LRRLRL and for maze C LRLRLR. The
biological experiments are performed
in three groups:

(I 1,I 2 ,..., In)

LC

...

DSC

Figure 4. Computational model for the dorsal striatum subsystem in the MPMS.

State detection

Action detection

St

at

• The control group has no landmark
at the decision-making point. The
rat learns the optimal path to the
goal only through the hippocampus memory subsystem.
• The landmark group has a landmark at each decision-making point
to indicate the right direction. The
rat learns the path to the goal location from the hippocampus subsystem and striatum subsystem.
• The MFB group has the ratbot
learn the goal location through the
computer-delivered virtual reward
based on amygdala subsystem and
hippocampus subsystem.

Figure 5. Computational model for the amygdala subsystem in the MPMS.

In the control group, only the hippocampus subsystem forms the relationships between environment infomation.
With inputs of distances to walls in
the allothetic pathway and the rat’s
own speed and direction in idiothetic
pathway, the hippocampus subsystem
builds the spatial relationships. As Figure 8 shows, the control group need six

trials to make the percentage of right
choices higher than 83.3 percent (defined as the high level, which means
at the six decision-making points, the
agent makes correct choices at least five
times). The trial number for the control group to reach the high level in the
computational model and biological

NOVEMBER/DECEMBER 2014

Q-value table
...
Q (St, at)

Update
Q (St, at)

...
...

Reward level
SI

www.computer.org/intelligent

experiment is the same: six trials in
three mazes.
In Figures 8b, 8d, and 8f, the blue
line represents the simulated learning process for control rats. In this
group, we assume that only the hippocampus subsystem is involved. In
the maze-learning task, two kinds of
9

CYBORG INTELLIGENCE

Initialize weights randomly in the open interval (0, 1)
for Trial = 1: Ntrial do
while current state is not the goal do
Choose the action a defined by equation 11 with probability ξ, or a random
action a in the probability of 1 − ξ
Action a is executed, the time step is updated t = t + 1.
Calculate the MFB stimulus intensity SI in amygdala subsystem.
Calculate the reward prediction error δ(t).
δ (t) = R(t) + γQ(st, at) +SI − Q(st − 1, at − 1)
where the R(t) is the reward received from state s (t − 1) to st at time t
and γ is the discount factor.
Calculate the eligibility trace.
eji (t) = λeji(t – 1) + ajAC* ai
where the γ is the decay rate of the eligibility trace.
Update the Connection weights wji between HPC/DSC i and AC j populations,
wji = wji + µ ∗ δ(t) ∗ eji(t)
where µ being the learning rate.
end while
end for
Figure 6. The temporal distance (TD) algorithm learning to update the weights.

G

S

Figure 7. Maze-learning task. S is the
start location, G is the goal location, the
yellow circle is the decision point, and
the purple star is the landmark.

representations of spatial information
are encoded by APC and IPC populations. The environmental geometric properties information is passed
to the APCs, whose firing rates are
computed by using Equation 1. In
our simulation, N APCs represent the
rat’s current position. The other kind
10

of idiothetic information is used to
compute the activity of IPCs by using Equation 4. With these inputs, the
HPC’s activity is calculated by Equation 5 to represent current environmental information. The random
initialization of weights between HPCs
and APCs/IPCs makes the percentage
of right choices in the first trial start
randomly, as shown in Figure 8b. After several trials, the weights between
HPCs and APCs/IPCs are updated to
choose the direction that leads to the
goal position using Equations 6 and 7.
As Figure 8b shows, from the sixth
trial, the percentage of right choices is
greater than 86.85 percent.
There are hippocampus and striatum
subsystems involved in the landmark
group. As shown in the control group,
the hippocampus subsystem will build
spatial relationships between walls in
the maze. In the striatum subsystem,
the landmark at the decision-making
point is associated with the correct
movement direction. The activity of
the AC encoding this movement direction is increased to the sum of these
two subsystems. In this computational
www.computer.org/intelligent

model, the trial number to reach the
high level is the same with the biological experiment in all three mazes.
In Figures 8b, 8d, and 8f, the green
line represents the simulated learning process for rats in maze with landmarks. In this group, the hippocampus
and striatum subsystems are assumed
to be involved in our proposed MPMS
model. In the maze-learning task, the
two subsystems encode the spatial
and landmark information. The spatial properties information is processed
by the hippocampus subsystem, and
the landmark information is passed to
the striatum subsystem. With this information, the firing rates of HPCs
and DSCs are computed using Equations 5 and 8. In the first trial (Figure
8b), the percentage of right choices
at the six decision-making points
is 52.15 percent. After several trials, the weights between ACs and
HPCs/DSCs are updated using Hebbian learning. As shown in Figure 8b,
from the fifth trial, the percentage of
correct choices is greater than 87.23
percent. With the involvement of landmark information, the computational
IEEE INTELLIGENT SYSTEMS

1.0

1.0
Control
Landmark
MFB

0.9
Percentage of right choices

Percentage of right choices

0.9

0.8

0.7

0.6

T1

T2

T3

T4

T5

T6

T7

1.0

1.0

0.9

0.9

0.8
0.7
0.6
0.5
0.4

T1

T2

T3

T4

T5

T6

T7

T8

T1

T2

T3

T4

T5

T6

T7

T8

T1

T2

T3

T4

T5

T6

T7

T8

(b)

Percentage of right choices

Percentage of right choices

0.6

0.4

T8

(a)

0.8
0.7
0.6
0.5

T1

T2

T3

T4

T5

T6

T7

0.4

T8

(c)

(d)
1.0

1.0

0.9

0.9
Percentage of right choices

Percentage of right choices

0.7

0.5

0.5

0.4

0.8

0.8
0.7
0.6
0.5
0.4

0.8
0.7
0.6
0.5

T1

T2

T3

(e)

T4

T5

T6

T7

0.4

T8

(f)

Figure 8. Results of biological experiments and computational model. (a) Biological results in maze A, (b) computational
results in maze A, (c) biological results in maze B, (d) computational results in maze B, (e) biological results in maze C, and (f)
computational results in maze C.

NOVEMBER/DECEMBER 2014

www.computer.org/intelligent

11

CYBORG INTELLIGENCE

THE AUTHORS
Lijuan Su is a PhD student in the Department of Computer Science at Zhejiang University. Her research interests include computational intelligence, neural computation, and
artificial intelligence. Contact her at sulijuan@zju.edu.cn.
Nenggan Zheng is an associate professor in the Qiushi Academy for Advanced Studies
at Zhejiang University. His research interests include neural computation and real-time
systems. Zheng has a PhD in computer science from Zhejiang University. He’s the corresponding author for this work. Contact him at zng@zju.edu.cn.
Min Yao is a professor in the Department of Computer Science at Zhejiang University.

His research interests include computational intelligence, fuzzy system, data mining, and
service computing. Yao has a PhD in biomedical engineering from Zhejiang University.
Contact him at myao@zju.edu.cn.
Zhaohui Wu is a professor in the Department of Computer Science at Zhejiang University.
His research interests include pervasive computing, distributed computing, and computational intelligence. Wu has a PhD in computer science from Zhejiang University. Contact
him at wzh@zju.edu.cn.

model can process the information effectively and predict the number of
times the rat correctly reaches the
same high performance in the landmark group (five trials) and the control
group (six trials) as the biological results, respectively.
In the MFB group, the amygdala
subsystem receives the computer-delivered MFB stimulus in the rat brain. The
MFB stimulus intensity is calculated by
the Q learning algorithm as the closer
to the goal, the higher the stimulus intensity. Based on the amygdala subsystem in the MFB group, the rat will
build the association between spatial
representation (encoded in the hippocampus subsystem) and reward prediction (computed by the computer).
To choose the movement with higher
reward prediction, the trials needed to
learn that the optimal path to the goal
is shorter than the route taken by the
control group, as shown in Figures 8b,
8d, and 8f. The trial number of ratbots
in the MFB group to reach the same
high level in the computational model
and biological experiment was the
same in all three trials.
In Figures 8b, 8d, and 8f, the magenta line represents the simulated
learning process for rats in the MFB
simulation. In this group, the hippocampus, striatum, and amygdala
12

subsystems are assumed to be involved. Spatial properties information is processed by the hippocampus

With the involvement
of the MFB stimulation,
the ratbots can update
the weights to the goal
position much more
quickly.
subsystem, and landmark information is processed by the striatum
subsystem. With these two kinds of
information, the firing rates of HPCs
and DSCs are computed by using
Equations 5 and 8. But in the MFB
group, after the rats choose an action via Equation 11, the amygdala
subsystem calculates MFB stimulus
intensity and updated the weights
between HPCs/DSCs and ACs using Equation 10 and Figure 6. In the
first trial, as Figure 8b shows, the
percentage of right choices is 69.21
percent, which is higher than that in
www.computer.org/intelligent

the other two groups because of the
involvement of the MFB subsystem.
After several trials, the weights between ACs and HPCs/DSCs are updated to choose the direction that
leads to the goal position using
Figure 6. As Figure 8b shows, the
time it takes to reach the high performance in the MFB group (three
trials) is faster than the other two
groups (five and six trials), which
means that with the involvement of
the MFB stimulation, the ratbots
can update the weights to the goal
position much more quickly.

T

hese results are promising. As
a next step in our work, the
MPMS model will be implemented in
a robot to conduct similar behavior
experiments and further validate our
proposed computational processes.
Comparative research will help us discover critical structure and weights in
a hybrid bio-machine memory system,
which can enhance the learning and
memory functionalities of cyborg intelligent systems, such as the ratbots
described in this article.

Acknowledgments
This work was supported by National Key
Basic Research Program of China (973 program 2013CB329504) and partially supported by Zhejiang Provincial Natural Science Foundation of China (LZ14F020002).
Correspondence and questions should be addressed to Nenggan Zheng (zng@zju.edu.cn).

References
1. Z. Wu, “The Convergence of Machine and
Biological Intelligence,” IEEE Intelligent
Systems, vol. 28, no. 5, 2013, pp. 28–43.
2. C. Sun et al., “Automatic Navigation for
Rat-Robots with Modeling of the Human
Guidance,” J. Bionic Eng., vol. 10, no. 1,
2013, pp. 46–56.
3. S.K. Talwar et al., “Behavioural
Neuroscience: Rat Navigation Guided
by Remote Control,” Nature, vol. 417,
2002, pp. 37–38.
IEEE INTELLIGENT SYSTEMS

4. M.A. Gluck and C.E. Myers, “Hippocampal Mediation of Stimulus Representation: A Computational Theory,”
Hippocampus, vol. 3, no. 4, 1993,
pp. 491–516.
5. N.M. White and R.J. McDonald,
“Multiple Parallel Memory Systems in
the Brain of the Rat,” Neurobiology of
Learning and Memory, vol. 77, no. 2,
2002, pp. 125–184.
6. J. O’Keefe, “A Review of the Hippocampal Place Cells,” Progress in
Neurobiology, vol. 13, no. 4, 1979,
pp. 419–439.

stay
on
the

7. T. Strosslin et al., “Robust Self-Localisation and Navigation Based on Hippocampal Place Cells,” Neural Networks,
vol. 18, no. 9, 2005, pp. 1125–1140.
8. R. Chavarriaga et al., “A Computational Model of Parallel Navigation Systems
in Rodents,” Neuroinformatics, vol. 3,
no. 3, 2005, pp. 223–241.
9. R. Kempter, W. Gerstner, and J.L. Van
Hemmen, “Hebbian Learning and Spiking Neurons,” Physical Rev. E, vol. 59,
no. 4, 1999, p. 4498–4514.
10. C. Zhang et al., “Bio-Robots Automatic
Navigation with Graded Electric Reward

Stimulation Based on Reinforcement
Learning,” Proc. 35th Annual Int’l Conf.
Eng. Medicine and Biology Soc., 2013,
pp. 6901–6904.
11. S.P. Singh and R.S. Sutton, “Reinforcement Learning with Replacing Eligibility Traces,” Machine Learning, vol. 22,
nos. 1–3, 1996, pp. 123–158.

Selected CS articles and columns
are also available for free at
http://ComputingNow.computer.org.

Cutting Edge

of Artificial Intelligence
IEEE Intelligent Systems provides
peer-reviewed, cutting-edge
articles on the theory and applications of systems that perceive,
reason, learn, and act intelligently.

www.computer.org/intelligent
NOVEMBER/DECEMBER 2014

www.computer.org/intelligent

IEEE

The #1 AI Magazine
13