Target versus obstacle e

TELKOMNIKA Behaviors Coordination a From those figures movement is sharp and not sm the experiments video it appe reaching target behavior, it’s sharp movement, the target ca robot’s movement is smoothe but its number is less than be detection on target location. T faster than the first one.

4.3 Target versus obstacle e

This experiment will b located near the robot. Experi that Subsumption Architecture same time leaving the targ important behavior of robot. M the target. It can be happene robot. 4.4 Q learning - obstacle avo In this experiment, Q to watch robot’s performance applied on robot use α = 0.7 a policy. Robot’s performance o Figure 15. It can be seen from one and other robot. The first left direction. Both of them are learning gives intelligence on e Robot’s goal in Q possible. Graphic of reward experiment is shown on Figur reward that received by robo receive many negative reward shows total accumulated rew be concluded that robot can m a b Figure 13. Subsumption Architecture and Motor Schema robot near obstacle and target ISSN: 1693-6930 n and Learning on Autonomous Navigation of …. H s above, it can be seen that Subsumption Arc smooth. There are many sharp turn on this robot’s pears that this robot is also faster than the other. ’s not useful much because when the robot move can be “lost” on the robot’s sight. On the other han her than the preceding one. The sharp turns are no before. This robot has slower movement, but it ha . That’s why the time that needed by this robot to r e experiment l be done to observe robot’s characteristics if target riment result shown on Figure 13. From figure abo ure give more reactive action by avoiding the obs rget. It is reasonable because obstacle avoida . Meanwhile the motor schema robot moves slowe ned because robot considers the target location th voidance behavior with fixed learning rate Q learning is applied in obstacle avoidance behav ce, a simple obstacle structure is prepared. Q le 7 and γ = 0.7. It utilizes greedy method for explorat e on the beginning and the end of trial is shown o m figures above that robot’s learning result can be irst robot tends to go to right direction and the sec re succeed to avoid the obstacle. This can be happ n each robot to decide best action for robot itself. Q learning point of view is collect positive rewa rds average every ten iterations and total rew ure 16 and Figure 17. From Figure 16, it can be s bot is getting better over the time. In the learning ards, but after 5 steps it starts to collect positive re rewards collected by robot is getting larger over th maximize its reward after learning for some time. a b Figure 14. Robot’s performance at the beginning and the end of trial 1 Figure 1 performance and the e Handy Wicaksono 479 Architecture robot’s ot’s trajectory. From er. However on this ve too fast with the and, Motor Schema not completely lost, has more accurate o reach the target is et and obstacle are bove it can be seen bstacle and at the dance is the most er, so it can detect that is near to the havior only. In order learning algorithm ration – exploitation on Figure 14. and e different between econd one chooses ppened because Q wards as many as ewards during the seen that average ng phase robot still rewards. Figure 17 the time. So it can a b 15. Robot’s ce at the beginning e end of trial 2 ISSN: 1693-6930 TELKOMNIKA Vol. 9, No. 3, December 2011 : 473 – 482 480 Figure 16. Average reward every tenth iteration Figure 17. Total rewards of Q learning obstacle avoidance behavior.

4.5 Q learning - obstacle avoidance behavior with varying learning rate