TELKOMNIKA
Behaviors Coordination a From those figures
movement is sharp and not sm the experiments video it appe
reaching target behavior, it’s sharp movement, the target ca
robot’s movement is smoothe but its number is less than be
detection on target location. T faster than the first one.
4.3 Target versus obstacle e
This experiment will b located near the robot. Experi
that Subsumption Architecture same time leaving the targ
important behavior of robot. M the target. It can be happene
robot. 4.4 Q learning - obstacle avo
In this experiment, Q to watch robot’s performance
applied on robot use α
= 0.7 a policy. Robot’s performance o
Figure 15. It can be seen from
one and other robot. The first left direction. Both of them are
learning gives intelligence on e Robot’s goal in Q
possible. Graphic of reward experiment is shown on Figur
reward that received by robo receive many negative reward
shows total accumulated rew be concluded that robot can m
a
b Figure 13. Subsumption
Architecture and Motor Schema robot near obstacle
and target ISSN: 1693-6930
n and Learning on Autonomous Navigation of …. H s above, it can be seen that Subsumption Arc
smooth. There are many sharp turn on this robot’s pears that this robot is also faster than the other.
’s not useful much because when the robot move can be “lost” on the robot’s sight. On the other han
her than the preceding one. The sharp turns are no before. This robot has slower movement, but it ha
. That’s why the time that needed by this robot to r
e experiment
l be done to observe robot’s characteristics if target riment result shown on Figure 13. From figure abo
ure give more reactive action by avoiding the obs rget. It is reasonable because obstacle avoida
. Meanwhile the motor schema robot moves slowe ned because robot considers the target location th
voidance behavior with fixed learning rate
Q learning is applied in obstacle avoidance behav ce, a simple obstacle structure is prepared. Q le
7 and γ
= 0.7. It utilizes greedy method for explorat e on the beginning and the end of trial is shown o
m figures above that robot’s learning result can be irst robot tends to go to right direction and the sec
re succeed to avoid the obstacle. This can be happ n each robot to decide best action for robot itself.
Q learning point of view is collect positive rewa rds average every ten iterations and total rew
ure 16 and Figure 17. From Figure 16, it can be s bot is getting better over the time. In the learning
ards, but after 5 steps it starts to collect positive re rewards collected by robot is getting larger over th
maximize its reward after learning for some time.
a
b Figure 14. Robot’s
performance at the beginning and the end of trial 1
Figure 1 performance
and the e
Handy Wicaksono 479
Architecture robot’s ot’s trajectory. From
er. However on this ve too fast with the
and, Motor Schema not completely lost,
has more accurate o reach the target is
et and obstacle are bove it can be seen
bstacle and at the dance is the most
er, so it can detect that is near to the
havior only. In order learning algorithm
ration – exploitation on Figure 14. and
e different between econd one chooses
ppened because Q wards as many as
ewards during the seen that average
ng phase robot still rewards. Figure 17
the time. So it can
a
b 15. Robot’s
ce at the beginning e end of trial 2
ISSN: 1693-6930
TELKOMNIKA Vol. 9, No. 3, December 2011 : 473 – 482
480
Figure 16. Average reward every tenth iteration
Figure 17. Total rewards of Q learning obstacle avoidance behavior.
4.5 Q learning - obstacle avoidance behavior with varying learning rate