autonomous learning of ball
play

Autonomous Learning of Ball Trapping in the Four-legged Robot - PowerPoint PPT Presentation

Autonomous Learning of Ball Trapping in the Four-legged Robot League Hayato Kobayashi 1 , Tsugutoyo Osaki 2 , Eric Williams 2 , Akira Ishino 1 , Ayumi Shinohara 2 1 Kyushu University, Japan 2 Tohoku University, Japan Motivation Passwork in


  1. Autonomous Learning of Ball Trapping in the Four-legged Robot League Hayato Kobayashi 1 , Tsugutoyo Osaki 2 , Eric Williams 2 , Akira Ishino 1 , Ayumi Shinohara 2 1 Kyushu University, Japan 2 Tohoku University, Japan

  2. Motivation  Passwork in the four-legged robot league  KeepAway Soccer [Stone et al. 2001]  Benchmark of good passing abilities in the simulation league  Passing Challenge  Technical challenge in this year It is too difficult for dogs KeepAway Soccer http://www.cs.utexas.edu/~AustinVilla/sim/keepaway/

  3. Ball Trapping  Stop and control an oncoming ball

  4. One-dimensional Model The passer is watching the chest of the receiver. The receiver is watching the ball.

  5. Autonomous Method  Same way as diligent humans Kick Wall

  6. Training Equipment Limit ball ’ s movement and robot ’ s locomotion to one-dimension Slope made of cardbord Rails made of string

  7. Learning Method  Sarsa(λ) [Rummery and Niranjan 1994; Sutton 1996]  Reinforcement learning algorithm  Tile-coding (aka CMACs [Albus 1975] )  Linear function approximation  For speeding up their learning

  8. Reinforcement Learning  Acquire maps from state input to action output maximizing the sum of rewards reward r  1 t Agent action Environment a (AIBO) t state s t In our study, each time step t = 0, 1, 2, … mean 0ms, 40ms, 80ms, …

  9. Implementation  State s t = ( x t , dx t )  x t ・・・ The distance from the robot to the ball [0,2000] ( mm )  dx t ・・・ The difference between the current x t and the previous x t of one time step before. [-200,200] ( mm )  Action a t  ready ・・・ Move its head to watch the ball  trap ・・・ Initiate the trapping motion

  10. Implementation  Reward r t+1  Positive  If the ball was correctly captured between the chin and the chest after the trap action.  Negative  If the trap action failed, or  If the ball touches the chest PSD sensor before the trap action is performed.  Zero  Otherwise

  11. Implementation  Episode  The period from kicking the ball to receiving any reward other than zero Trap! Kick!

  12. Experiments  Using one robot  Using two robots  without communication  with communication

  13. Using One Robot  Earlier phase https://youtu.be/hv1sgIZLpKA

  14. Using One Robot  Later phase https://youtu.be/XJBllv7wJXQ

  15. Result of Learning Using One Robot trapping success rate every 10 episodes 100 80 60 traping success rate 40 20 0 0 50 100 150 200 250 300 350 episodes

  16. Episodes 1 … 50 Result of each episode ● successful × failed in spite of trying 200 failure success ▲ failed because of doing nothing collision 150 100 50 dx 0 -50 -100 -150 -200 0 500 1000 1500 2000 x

  17. Episodes 51 … 100 Result of each episode ● successful × failed in spite of trying 200 failure success ▲ failed because of doing nothing collision 150 100 50 dx 0 -50 -100 -150 -200 0 500 1000 1500 2000 x

  18. Episodes 101 … 150 Result of each episode ● successful × failed in spite of trying 200 failure success ▲ failed because of doing nothing collision 150 100 50 dx 0 -50 -100 -150 -200 0 500 1000 1500 2000 x

  19. Episodes 151 … 200 Result of each episode ● successful × failed in spite of trying 200 failure success ▲ failed because of doing nothing collision 150 100 50 dx 0 -50 -100 -150 -200 0 500 1000 1500 2000 x

  20. Using Two Robots  Simply replace slope with another robot  Active Learner (AL)  Original robot  Same as in case of training using one robot  Passive Learner (PL)  Replaces slope  Does not approach the ball if the trapping failed

  21. Using Two Robots  Earlier phase https://youtu.be/sXkVYZjOzjg

  22. Using Two Robots  Later phase https://youtu.be/opvoyv9h-GU

  23. Result of Learning Using Two Robots Without Communication trapping success rate every 10 episodes 100 AL PL 80 Active Learner Passive Learner 60 traping success rate 40 20 0 0 50 100 150 200 250 300 350 episodes

  24. Problem of Using Two Robots  Takes a long time to learn  AL can only learn when PL itself succeeds  Cannot learn if the ball is not returned  Even if we use only two ALs, the problem is not resolved  Just learn slowly, though simultaneously.

  25. Solution  Sharing their experiences  Their experiences include  Action a t ( trap or ready )  State variables s t =( x t ,dx t )  Reward r t+1

  26. Result of Learning Using Two Robots With Communication trapping success rate every 10 episodes 100 AL PL 80 Active Learner Passive Learner 60 traping success rate 40 20 0 0 50 100 150 200 250 300 350 400 episodes

  27. Conclusion  The goal of pass-work is achieved in one-dimension  learned the skills without human intervention  learned more quickly by exchanging experiences with each other

  28. Future Work  Extend trapping skills to two-dimensions  Layered Learning [Stone 2000]  Make goalies stronger  Make robots learn passing skills simultaneously

  29. Thank you for your attention! Bremen is a good town!

Recommend


More recommend