richard gibson ph d thesis presentation december 6 2013
play

Richard Gibson Ph.D. Thesis Presentation December 6, 2013 Computer - PowerPoint PPT Presentation

Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker Agents Richard Gibson Ph.D. Thesis Presentation December 6, 2013 Computer Poker Research Group Heads Up Limit Texas Hold'em Source: ebaumsworld.com


  1. Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker Agents Richard Gibson Ph.D. Thesis Presentation December 6, 2013

  2. Computer Poker Research Group

  3. Heads Up Limit Texas Hold'em Source: ebaumsworld.com Fold? Bet! Call? Raise?

  4. Heads Up No-limit Texas Hold'em Source: ebaumsworld.com Bet! All-in!

  5. 3-Player Limit Texas Hold'em Source: toonpool.com Source: ebaumsworld.com Call. Fold? Bet! Call? Raise?

  6. 3-Player Limit Texas Hold'em Source: toonpool.com Source: ebaumsworld.com 2010 - 2013 Hyperborean3p

  7. Hyperborean3p 2009 ● No theory – 3-player – Imperfect recall ● Slow ● Memory expensive

  8. Hyperborean3p 2009 2013 ● New theory ● No theory – Many players – 3-player – Imperfect recall – Imperfect recall ● Fast ● Slow ● Improved performance ● Memory expensive with limited memory

  9. Outline of Presentation ● Background – Counterfactual Regret Minimization (CFR) ● Theoretical Advancements for CFR in: – Many player games – Imperfect recall games ● CFR Speed-Ups ● Tricks with Memory Limitations ● Conclusion + Future Work

  10. Outline of Presentation ● Background – Counterfactual Regret Minimization (CFR) ● Theoretical Advancements for CFR in: – Many player games – Imperfect recall games ● CFR Speed-Ups ● Tricks with Memory Limitations ● Conclusion + Future Work

  11. Background - Kuhn Poker

  12. Background - Kuhn Poker c

  13. Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 ?

  14. Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 1 1 c b c b Check / Bet ? Information set ?

  15. Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 1 1 c b c b Bet! Fold / Call ? 2 2 ? c c f f

  16. Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 1 1 c b c b Bet! Fold. 2 2 c c f f +1 +1 +1 -1

  17. Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 1 1 c b c b Bet! Call. 2 2 / c c f f +1 +2 +1 -2 +2 / -2 -2 / +2

  18. Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 1 1 c b c b Check. Check / Bet ? 2 2 2 2 ? c c c c b f b f +1 +2 +1 -2

  19. Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 1 1 c b c b Check. Check. 2 2 2 2 / c c c c b f b f +1 -1 +1 +2 +1 -2 -1 / +1 +1 / -1

  20. Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 1 1 c b c b Fold / Call ? Bet! 2 2 2 2 ? c c c c b f b f +1 -1 1 +1 +2 1 +1 -2 f c f c -1 +2 -1 -2 Information set

  21. Background In general: c ... ... QK QJ 1/6 1/6 1 1 Extensive-Form c b c b Game 2 2 2 2 c c c c b f b f +1 -1 1 +1 +2 1 +1 -2 f c f c -1 +2 -1 -2

  22. Background In general: c ... ... QK QJ 1/6 1/6 1 1 Extensive-Form .6 .4 .6 .4 Game 2 2 2 2 .8 .9 0 0 .2 1 1 .1 +1 -1 1 +1 +2 1 +1 -2 .7 .7 .3 .3 -1 +2 -1 -2 Strategy Profile

  23. Background In general: Nash equilibrium: “No one can change their strategy and do any better.” Extensive-Form Game Nash Equilibrium Strategy Profile

  24. Background In general: Nash equilibrium: “No one can change their strategy and do any better.” Extensive-Form 1/3 Game 1/3 1/3 Nash Equilibrium Every game has a Nash Strategy Profile equilibrium.

  25. Background In general: Nash equilibrium: “No one can change their strategy and do any better.” Extensive-Form 1/3 Game 1/3 ? 1/3 Nash Equilibrium Every game has a Nash Strategy Profile equilibrium.

  26. Outline of Presentation ● Background – Counterfactual Regret Minimization (CFR) ● Theoretical Advancements for CFR in: – Many player games – Imperfect recall games ● CFR Speed-Ups ● Tricks for CFR with Memory Limitations ● Conclusion + Future Work

  27. CFR c ● “The alpha-beta ... ... QK QJ 1/6 1/6 search of imperfect 1 1 information games.” c b c b 2 2 2 2 c c c c b f b f +1 -1 1 +1 +2 1 +1 -2 f c f c -1 +2 -1 -2

  28. CFR c ● “The alpha-beta ... ... QK QJ 1/6 1/6 search of imperfect 1 1 information games.” c b c b ● Offline algorithm 2 2 2 2 c c c c b f b f +1 -1 1 +1 +2 1 +1 -2 f c f c -1 +2 -1 -2

  29. CFR c ● “The alpha-beta ... ... QK QJ 1/6 1/6 search of imperfect 1 1 information games.” .5 .5 .5 .5 ● Offline algorithm 2 2 2 2 .5 .5 .5 .5 .5 .5 .5 .5 ● Iterative, “self-play” +1 -1 1 +1 +2 1 +1 -2 .5 .5 .5 .5 -1 +2 -1 -2

  30. CFR c ● “The alpha-beta ... ... QK QJ 1/6 1/6 search of imperfect 1 1 information games.” .7 .3 .7 .3 ● Offline algorithm 2 2 2 2 .8 1 0 0 .2 1 1 0 ● Iterative, “self-play” +1 -1 1 +1 +2 1 +1 -2 ● For each iteration, .5 .5 .5 .5 update action -1 +2 -1 -2 probabilities at every information set.

  31. CFR Strategy 1 + Strategy 2 + ... + Strategy T T ∞ Nash Equilibrium Strategy Profile T = T = number of iterations Average Strategy Profile

  32. Background Extensive-Form Game CFR Nash Equilibrium Strategy Profile

  33. Background Kuhn Poker CFR Nash Equilibrium Strategy Profile

  34. Background >10 14 information sets Texas Hold'em CFR Nash Equilibrium > 5 million GB Strategy Profile

  35. Background Large Extensive-Form Game ? Nash Equilibrium Strategy Profile

  36. Background Large Abstract Extensive-Form Game Game

  37. Background ● Merge card deals into buckets. Abstract Extensive-Form Game Game

  38. Background ● Merge card deals into buckets. Abstract Extensive-Form Game Game

  39. Background Abstract Extensive-Form Game Game ≈10 9 >10 14

  40. Background Abstract Extensive-Form Game Game ≈10 9 >10 14 CFR Abstract Game Equilibrium Strategy

  41. Background Abstract Extensive-Form Game Game ≈10 9 >10 14 CFR Abstract Game Approximate Full Game Equilibrium Strategy Equilibrium Strategy ≈100 GB

  42. Outline of Presentation ● Background – Counterfactual Regret Minimization (CFR) ● Theoretical Advancements for CFR in: – Many player games – Imperfect recall games ● CFR Speed-Ups ● Tricks with Memory Limitations ● Conclusion + Future Work

  43. Theory – Many Player Games L I Extensive-Form A Game R CFR ! Nash Equilibrium Strategy Profile

  44. Theory – Many Player Games 2-player 3-or-more Zero-Sum Game Player Game CFR CFR ? Nash Equilibrium Strategy Profile (Not equilibrium)

  45. Theory – Many Player Games Annual Computer Poker Competition 3-Player Limit Texas Hold'em - 2009 Agent Total Bankroll (mbb/g) Hyperborean3p 319 ± 2 dpp 171 ± 2 3-player Limit akuma 151 ± 2 Texas Hold'em CMURingLimit -37 ± 2 dcu3pl -63 ± 2 Bluechip -548 ± 2 CFR Good strategy? (Not equilibrium)

  46. Theory – Many Player Games c ... ... QJ QK 1/6 1/6 1 1 c b c b 2 2 2 2 c b f c c b f c +1 1 +1 +2 -1 -1 1 +1 -2 f c f c -1 +2 -1 -2

  47. Theory – Many Player Games c ... ... QJ QK 1/6 1/6 1 1 c b c b 2 2 2 2 c b f c c b f c +1 1 +1 +2 -1 -1 1 +1 -2 f c f c -1 +2 -1 -2

  48. Theory – Many Player Games c ... ... QJ QK 1/6 1/6 1 1 Dominated Strategies c b c b 2 2 2 2 c b f c c b f c +1 1 +1 +2 -1 -1 1 +1 -2 f c f c -1 +2 -1 -2

  49. Theory – Many Player Games c ... ... QJ QK 1/6 1/6 1 1 c b c b 2 2 2 2 c b f c b c +1 1 +1 -1 -1 1 -2 f c f c -1 +2 -1 -2

  50. Theory – Many Player Games c ... ... QJ QK 1/6 1/6 1 1 c b c b 2 2 2 2 c b f c b c +1 1 +1 -1 -1 1 -2 f c f c -1 +2 -1 -2 Iteratively Dominated Strategy

  51. Theory – Many Player Games 3-or-more Player Game CFR Average Strategy Profile T ∞ No Iteratively New! Dominated Strategies [G., arXiv ePrints 2013]

  52. Theory – Many Player Games 3-or-more 3-or-more Player Game Player Game CFR CFR Average “Current” Strategy Profile Strategy Profile T T ∞ Finite T No Iteratively No Iteratively New! Dominated Strategies New! Dominated Strategies [G., arXiv ePrints 2013]

  53. Theory – Many Player Games 3-Player Limit Texas Hold'em - 2012 New!

  54. Outline of Presentation ● Background – Counterfactual Regret Minimization (CFR) ● Theoretical Advancements for CFR in: – Many player games – Imperfect recall games ● CFR Speed-Ups ● Tricks with Memory Limitations ● Conclusion + Future Work

  55. Imperfect Recall Abstract Extensive-Form Game Game L I A CFR L R I A ! Abstract Game R Equilibrium Strategy !

  56. Imperfect Recall “Perfect “Imperfect Recall” Recall” Abstract Game Abstract Game CFR CFR ? Abstract Game Equilibrium Strategy (Not equilibrium)

  57. Imperfect Recall Pre-flop

  58. Imperfect Recall Pre-flop Flop

  59. Imperfect Recall Imperfect Recall Abstract Game

  60. Imperfect Recall Perfect Recall Abstract Game

Recommend


More recommend