the gizmo player
play

The Gizmo Player Simon Doll Jan Kopcsek Alper Tunga Dresden, - PowerPoint PPT Presentation

Fakulttsname Informatik Fachrichtung Informatik Institutsname Intelligente Systeme The Gizmo Player Simon Doll Jan Kopcsek Alper Tunga Dresden, 13.02.2008 Finding a heuristic function Two ways for learning a heuristic function:


  1. Fakultätsname Informatik Fachrichtung Informatik Institutsname Intelligente Systeme The Gizmo Player Simon Dollé Jan Kopcsek Alper Tunga Dresden, 13.02.2008

  2. Finding a heuristic function Two ways for learning a heuristic function: • Deductive – Analyzing the rules – Identify common elements like game boards or pieces – Finding patterns • Inductive – Playing and learning from experience – Monte Carlo strategy TU Dresden, 13.02.2008 Gizmo Player Slide 2 of 10

  3. Monte Carlo Strategy • Play random games • Compute the means of scores for each move Use them as a  heuristic function TU Dresden, 13.02.2008 Gizmo Player Slide 3 of 10

  4. Monte Carlo Strategy • Play random games • Compute the means of scores for each move Use them as a  heuristic function TU Dresden, 13.02.2008 Gizmo Player Slide 3 of 10

  5. Monte Carlo Strategy • Play random games • Compute the means of scores for each move Use them as a  heuristic function TU Dresden, 13.02.2008 Gizmo Player Slide 3 of 10

  6. Monte Carlo Strategy • Play random games • Compute the means of scores for each move Use them as a  heuristic function TU Dresden, 13.02.2008 Gizmo Player Slide 3 of 10

  7. Monte Carlo Strategy • Play random games • Compute the means of scores for each move Use them as a  heuristic function TU Dresden, 13.02.2008 Gizmo Player Slide 3 of 10

  8. Monte Carlo Strategy • Problem: Same effort spend on interesting moves and uninteresting moves • Equivalent to play against a dummy player • • UCT Algorithm (Upper Confidence Bound for Trees): An algorithm to balance: • Exploration of interesting parts of the graph  Exploration of new parts  Make random games more realistic • TU Dresden, 13.02.2008 Gizmo Player Slide 4 of 10

  9. UCT Algorithm As long as there are unexplored • moves from our current state, explore them TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

  10. UCT Algorithm As long as there are unexplored • moves from our current state, explore them TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

  11. UCT Algorithm As long as there are unexplored • moves from our current state, explore them TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

  12. UCT Algorithm As long as there are unexplored • moves from our current state, explore them TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

  13. UCT Algorithm As long as there are unexplored • moves from our current state, explore them TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

  14. UCT Algorithm As long as there are unexplored • moves from our current state, explore them TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

  15. UCT Algorithm As long as there are unexplored • moves from our current state, explore them Otherwise, choose the one with • the highest score using h : the heuristic value n : the number of games through the parent node n i : the number of games through the node TU Dresden, 13.02.2008 Gizmo Player Slide 6 of 10

  16. UCT Algorithm As long as there are unexplored • moves from our current state, explore them Otherwise, choose the one with • the highest score using h : the heuristic value n : the number of games through the parent node n i : the number of games through the node TU Dresden, 13.02.2008 Gizmo Player Slide 6 of 10

  17. UCT Algorithm As long as there are unexplored • moves from our current state, explore them Otherwise, choose the one with • the highest score using h : the heuristic value n : the number of games through the parent node n i : the number of games through the node TU Dresden, 13.02.2008 Gizmo Player Slide 6 of 10

  18. UCT Algorithm As long as there are unexplored • moves from our current state, explore them Otherwise, choose the one with • the highest score using h : the heuristic value n : the number of games through the parent node n i : the number of games through the node TU Dresden, 13.02.2008 Gizmo Player Slide 6 of 10

  19. UCT Algorithm As long as there are unexplored • moves from our current state, explore them Otherwise, choose the one with • the highest score using h : the heuristic value n : the number of games through the parent node n i : the number of games through the node TU Dresden, 13.02.2008 Gizmo Player Slide 6 of 10

  20. UCT Algorithm Which move to play? • The one with the highest  heuristic value In multiplayer games: • Store the heuristic value  for each player TU Dresden, 13.02.2008 Gizmo Player Slide 7 of 10

  21. Good points • Heuristic directly linked to the final score • Heuristic converges to min-max values • Time scalable • Easily parallelisable TU Dresden, 13.02.2008 Gizmo Player Slide 8 of 10

  22. Problems • Simultaneous moves: – What rule to choose to explore the nodes? – Which move to play? • Long games and loops: – Depth first search problem TU Dresden, 13.02.2008 Gizmo Player Slide 9 of 10

  23. Thank you for your attention And good luck to your players TU Dresden, 13.02.2008 Gizmo Player Slide 10 of 10

Recommend


More recommend