Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker Agents Richard Gibson Ph.D. Thesis Presentation December 6, 2013
Computer Poker Research Group
Heads Up Limit Texas Hold'em Source: ebaumsworld.com Fold? Bet! Call? Raise?
Heads Up No-limit Texas Hold'em Source: ebaumsworld.com Bet! All-in!
3-Player Limit Texas Hold'em Source: toonpool.com Source: ebaumsworld.com Call. Fold? Bet! Call? Raise?
3-Player Limit Texas Hold'em Source: toonpool.com Source: ebaumsworld.com 2010 - 2013 Hyperborean3p
Hyperborean3p 2009 ● No theory – 3-player – Imperfect recall ● Slow ● Memory expensive
Hyperborean3p 2009 2013 ● New theory ● No theory – Many players – 3-player – Imperfect recall – Imperfect recall ● Fast ● Slow ● Improved performance ● Memory expensive with limited memory
Outline of Presentation ● Background – Counterfactual Regret Minimization (CFR) ● Theoretical Advancements for CFR in: – Many player games – Imperfect recall games ● CFR Speed-Ups ● Tricks with Memory Limitations ● Conclusion + Future Work
Outline of Presentation ● Background – Counterfactual Regret Minimization (CFR) ● Theoretical Advancements for CFR in: – Many player games – Imperfect recall games ● CFR Speed-Ups ● Tricks with Memory Limitations ● Conclusion + Future Work
Background - Kuhn Poker
Background - Kuhn Poker c
Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 ?
Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 1 1 c b c b Check / Bet ? Information set ?
Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 1 1 c b c b Bet! Fold / Call ? 2 2 ? c c f f
Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 1 1 c b c b Bet! Fold. 2 2 c c f f +1 +1 +1 -1
Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 1 1 c b c b Bet! Call. 2 2 / c c f f +1 +2 +1 -2 +2 / -2 -2 / +2
Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 1 1 c b c b Check. Check / Bet ? 2 2 2 2 ? c c c c b f b f +1 +2 +1 -2
Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 1 1 c b c b Check. Check. 2 2 2 2 / c c c c b f b f +1 -1 +1 +2 +1 -2 -1 / +1 +1 / -1
Background - Kuhn Poker c ... ... QK QJ 1/6 1/6 1 1 c b c b Fold / Call ? Bet! 2 2 2 2 ? c c c c b f b f +1 -1 1 +1 +2 1 +1 -2 f c f c -1 +2 -1 -2 Information set
Background In general: c ... ... QK QJ 1/6 1/6 1 1 Extensive-Form c b c b Game 2 2 2 2 c c c c b f b f +1 -1 1 +1 +2 1 +1 -2 f c f c -1 +2 -1 -2
Background In general: c ... ... QK QJ 1/6 1/6 1 1 Extensive-Form .6 .4 .6 .4 Game 2 2 2 2 .8 .9 0 0 .2 1 1 .1 +1 -1 1 +1 +2 1 +1 -2 .7 .7 .3 .3 -1 +2 -1 -2 Strategy Profile
Background In general: Nash equilibrium: “No one can change their strategy and do any better.” Extensive-Form Game Nash Equilibrium Strategy Profile
Background In general: Nash equilibrium: “No one can change their strategy and do any better.” Extensive-Form 1/3 Game 1/3 1/3 Nash Equilibrium Every game has a Nash Strategy Profile equilibrium.
Background In general: Nash equilibrium: “No one can change their strategy and do any better.” Extensive-Form 1/3 Game 1/3 ? 1/3 Nash Equilibrium Every game has a Nash Strategy Profile equilibrium.
Outline of Presentation ● Background – Counterfactual Regret Minimization (CFR) ● Theoretical Advancements for CFR in: – Many player games – Imperfect recall games ● CFR Speed-Ups ● Tricks for CFR with Memory Limitations ● Conclusion + Future Work
CFR c ● “The alpha-beta ... ... QK QJ 1/6 1/6 search of imperfect 1 1 information games.” c b c b 2 2 2 2 c c c c b f b f +1 -1 1 +1 +2 1 +1 -2 f c f c -1 +2 -1 -2
CFR c ● “The alpha-beta ... ... QK QJ 1/6 1/6 search of imperfect 1 1 information games.” c b c b ● Offline algorithm 2 2 2 2 c c c c b f b f +1 -1 1 +1 +2 1 +1 -2 f c f c -1 +2 -1 -2
CFR c ● “The alpha-beta ... ... QK QJ 1/6 1/6 search of imperfect 1 1 information games.” .5 .5 .5 .5 ● Offline algorithm 2 2 2 2 .5 .5 .5 .5 .5 .5 .5 .5 ● Iterative, “self-play” +1 -1 1 +1 +2 1 +1 -2 .5 .5 .5 .5 -1 +2 -1 -2
CFR c ● “The alpha-beta ... ... QK QJ 1/6 1/6 search of imperfect 1 1 information games.” .7 .3 .7 .3 ● Offline algorithm 2 2 2 2 .8 1 0 0 .2 1 1 0 ● Iterative, “self-play” +1 -1 1 +1 +2 1 +1 -2 ● For each iteration, .5 .5 .5 .5 update action -1 +2 -1 -2 probabilities at every information set.
CFR Strategy 1 + Strategy 2 + ... + Strategy T T ∞ Nash Equilibrium Strategy Profile T = T = number of iterations Average Strategy Profile
Background Extensive-Form Game CFR Nash Equilibrium Strategy Profile
Background Kuhn Poker CFR Nash Equilibrium Strategy Profile
Background >10 14 information sets Texas Hold'em CFR Nash Equilibrium > 5 million GB Strategy Profile
Background Large Extensive-Form Game ? Nash Equilibrium Strategy Profile
Background Large Abstract Extensive-Form Game Game
Background ● Merge card deals into buckets. Abstract Extensive-Form Game Game
Background ● Merge card deals into buckets. Abstract Extensive-Form Game Game
Background Abstract Extensive-Form Game Game ≈10 9 >10 14
Background Abstract Extensive-Form Game Game ≈10 9 >10 14 CFR Abstract Game Equilibrium Strategy
Background Abstract Extensive-Form Game Game ≈10 9 >10 14 CFR Abstract Game Approximate Full Game Equilibrium Strategy Equilibrium Strategy ≈100 GB
Outline of Presentation ● Background – Counterfactual Regret Minimization (CFR) ● Theoretical Advancements for CFR in: – Many player games – Imperfect recall games ● CFR Speed-Ups ● Tricks with Memory Limitations ● Conclusion + Future Work
Theory – Many Player Games L I Extensive-Form A Game R CFR ! Nash Equilibrium Strategy Profile
Theory – Many Player Games 2-player 3-or-more Zero-Sum Game Player Game CFR CFR ? Nash Equilibrium Strategy Profile (Not equilibrium)
Theory – Many Player Games Annual Computer Poker Competition 3-Player Limit Texas Hold'em - 2009 Agent Total Bankroll (mbb/g) Hyperborean3p 319 ± 2 dpp 171 ± 2 3-player Limit akuma 151 ± 2 Texas Hold'em CMURingLimit -37 ± 2 dcu3pl -63 ± 2 Bluechip -548 ± 2 CFR Good strategy? (Not equilibrium)
Theory – Many Player Games c ... ... QJ QK 1/6 1/6 1 1 c b c b 2 2 2 2 c b f c c b f c +1 1 +1 +2 -1 -1 1 +1 -2 f c f c -1 +2 -1 -2
Theory – Many Player Games c ... ... QJ QK 1/6 1/6 1 1 c b c b 2 2 2 2 c b f c c b f c +1 1 +1 +2 -1 -1 1 +1 -2 f c f c -1 +2 -1 -2
Theory – Many Player Games c ... ... QJ QK 1/6 1/6 1 1 Dominated Strategies c b c b 2 2 2 2 c b f c c b f c +1 1 +1 +2 -1 -1 1 +1 -2 f c f c -1 +2 -1 -2
Theory – Many Player Games c ... ... QJ QK 1/6 1/6 1 1 c b c b 2 2 2 2 c b f c b c +1 1 +1 -1 -1 1 -2 f c f c -1 +2 -1 -2
Theory – Many Player Games c ... ... QJ QK 1/6 1/6 1 1 c b c b 2 2 2 2 c b f c b c +1 1 +1 -1 -1 1 -2 f c f c -1 +2 -1 -2 Iteratively Dominated Strategy
Theory – Many Player Games 3-or-more Player Game CFR Average Strategy Profile T ∞ No Iteratively New! Dominated Strategies [G., arXiv ePrints 2013]
Theory – Many Player Games 3-or-more 3-or-more Player Game Player Game CFR CFR Average “Current” Strategy Profile Strategy Profile T T ∞ Finite T No Iteratively No Iteratively New! Dominated Strategies New! Dominated Strategies [G., arXiv ePrints 2013]
Theory – Many Player Games 3-Player Limit Texas Hold'em - 2012 New!
Outline of Presentation ● Background – Counterfactual Regret Minimization (CFR) ● Theoretical Advancements for CFR in: – Many player games – Imperfect recall games ● CFR Speed-Ups ● Tricks with Memory Limitations ● Conclusion + Future Work
Imperfect Recall Abstract Extensive-Form Game Game L I A CFR L R I A ! Abstract Game R Equilibrium Strategy !
Imperfect Recall “Perfect “Imperfect Recall” Recall” Abstract Game Abstract Game CFR CFR ? Abstract Game Equilibrium Strategy (Not equilibrium)
Imperfect Recall Pre-flop
Imperfect Recall Pre-flop Flop
Imperfect Recall Imperfect Recall Abstract Game
Imperfect Recall Perfect Recall Abstract Game
Recommend
More recommend