cs344m autonomous multiagent systems
play

CS344M Autonomous Multiagent Systems Patrick MacAlpine Department - PowerPoint PPT Presentation

CS344M Autonomous Multiagent Systems Patrick MacAlpine Department of Computer Science The University of Texas at Austin Good Afternoon, Colleagues Are there any questions? Patrick MacAlpine Good Afternoon, Colleagues Are there any


  1. CS344M Autonomous Multiagent Systems Patrick MacAlpine Department of Computer Science The University of Texas at Austin

  2. Good Afternoon, Colleagues Are there any questions? Patrick MacAlpine

  3. Good Afternoon, Colleagues Are there any questions? • How is SMDP different from MDP? • Advantages of tile coding vs other approaches? • What about SPAR (Strategic Position by Attraction and Repulsion)? Patrick MacAlpine

  4. Logistics • Progress reports due at beginning of class today Patrick MacAlpine

  5. Logistics • Progress reports due at beginning of class today • Progress report peer review due next Thursday – reports to review will be sent out shortly Patrick MacAlpine

  6. Logistics • Progress reports due at beginning of class today • Progress report peer review due next Thursday – reports to review will be sent out shortly • Prize for winning class tournament Patrick MacAlpine

  7. Logistics • Progress reports due at beginning of class today • Progress report peer review due next Thursday – reports to review will be sent out shortly • Prize for winning class tournament • 10+ students went to Undergraduate Writing Center :) Patrick MacAlpine

  8. Logistics • Progress reports due at beginning of class today • Progress report peer review due next Thursday – reports to review will be sent out shortly • Prize for winning class tournament • 10+ students went to Undergraduate Writing Center :) Patrick MacAlpine

  9. Reinforcement Learning Image from wikipedia Patrick MacAlpine

  10. Reinforcement Learning Image from wikipedia Markov Decision Process (MDP) Patrick MacAlpine

  11. Reinforcement Learning Image from wikipedia Markov Decision Process (MDP) Important questions: • What is your state space? Patrick MacAlpine

  12. Reinforcement Learning Image from wikipedia Markov Decision Process (MDP) Important questions: • What is your state space? • What is your action space? Patrick MacAlpine

  13. Reinforcement Learning Image from wikipedia Markov Decision Process (MDP) Important questions: • What is your state space? • What is your action space? • What is your reward function? Patrick MacAlpine

  14. SARSA (s t ,a t ,r t ,s t +1 ,a t +1 ) Image from wikipedia Patrick MacAlpine

  15. SARSA (s t ,a t ,r t ,s t +1 ,a t +1 ) Image from wikipedia Learn Q table (value function) for state - action pairs Q ( s t , a t ) ← Q ( s t , a t ) + α [ r t +1 + γQ ( s t +1 , a t +1 ) − Q ( s t , a t )] Patrick MacAlpine

  16. Keepaway • Keepaway videos Patrick MacAlpine

  17. Keepaway • Keepaway videos • Slides Patrick MacAlpine

  18. Keepaway Discussion • Could you use learned policies for full soccer game? Patrick MacAlpine

  19. Keepaway Discussion • Could you use learned policies for full soccer game? • Could we apply competitve co-evolution? Patrick MacAlpine

  20. Keepaway Discussion • Could you use learned policies for full soccer game? • Could we apply competitve co-evolution? • Other sub-tasks in soccer that might be learnable? Patrick MacAlpine

  21. Keepaway Discussion • Could you use learned policies for full soccer game? • Could we apply competitve co-evolution? • Other sub-tasks in soccer that might be learnable? Patrick MacAlpine

  22. Half Field Offense <Slides> Patrick MacAlpine

  23. Policy Search vs Value Function Based RL Policy Search Value Function Based Learn Policy parameters Value function Good For Tuning parameter values Learning discrete actions Evaluation Fitness function Reward function Algorithms CMA-ES, genetic algorithms, etc. SARSA, Q-learning, etc. Patrick MacAlpine

Recommend


More recommend