Recent Advances in Computer Poker and Future Research for Artificial Intelligence in Video Games Richard Gibson SIAT Faculty Search Presentation February 28, 2013
One Slide Summary ● 2009 – 2013: Computer Poker Research
One Slide Summary ● 2009 – 2013: Computer Poker Research ● Future: AI in Video Games Image source: co-optimus.com Image source: arcadelearningenvironment.org
Outline of Presentation ● Computer Poker Primer – Motivation – Background ● New Contributions to Computer Poker – Research + Hyperborean3p ● Future Research – AI in Video Games – StarCraft AI, ALE, automated content generation ● Teaching Interests – Game design, AI in video games
Outline of Presentation ● Computer Poker Primer – Motivation – Background ● New Contributions to Computer Poker – Research + Hyperborean3p ● Future Research – AI in Video Games – StarCraft AI, ALE, automated content generation ● Teaching Interests – Game design, AI in video games
Why Poker Research? ● Classic games, such as chess and checkers, are: – Deterministic – Binary outcomes (+ draw) – Perfect Information Image source: spectrum.ieee.org Image sources: Wikipedia
Why Poker Research? ● However, poker is a game with: – Stochastic elements Image sources: Wikipedia Flop? Flop? . . . . . . Flop?
Why Poker Research? ● However, poker is a game with: – Stochastic elements – Varying outcomes Pot 1 Image source: ebaumsworld.com Pot 2 Pot 3
Why Poker Research? ● However, poker is a game with: – Stochastic elements – Varying outcomes – Imperfect information ? ?
Why Poker Research? ● Poker research is applicable in other areas: – Airport security [Pita et al. , AI Magazine 2009] – Adaptive treatment strategies [Chen and Bowling, NIPS 2012] – Sequential auctions [?]
Outline of Presentation ● Computer Poker Primer – Motivation – Background ● New Contributions to Computer Poker – Research + Hyperborean3p ● Future Research – AI in Video Games – StarCraft AI, ALE, automated content generation ● Teaching Interests – Game design, AI in video games
Poker Research Background ● Model poker as an extensive-form game : c QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c +1 1 +1 +2 -1 -1 1 +1 -2 f c f c -1 +2 -1 -2
Poker Research Background ● Information sets : Sets of states a player cannot distinguish between. c QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c +1 1 +1 +2 -1 -1 1 +1 -2 f c f c -1 +2 -1 -2
Poker Research Background ● Example: Kuhn Poker
Poker Research Background ● Example: Kuhn Poker
Poker Research Background ● Example: Kuhn Poker ?
Poker Research Background ● Example: Kuhn Poker Fold? Bet! ? Call?
Poker Research Background ● Example: Kuhn Poker Call. ?
Poker Research Background ● Example: Kuhn Poker
Poker Research Background ● Example: Kuhn Poker -2 +2 Lose. Win!
Poker Research Background ● Example: Kuhn Poker c QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c +1 1 +1 +2 -1 -1 1 +1 -2 f c f c -1 +2 -1 -2
Poker Research Background Extensive-Form Game Strategy Profile
Poker Research Background A strategy profile maps each information set to probability a ● distribution over actions. c QJ QK 0.5 0.5 1 1 0.6 c b 0.4 0.6 c b 0.4 2 2 2 2 0.8 0.2 1 0 0 1 0 1 c b f c c b f c +1 1 +1 +2 -1 -1 1 +1 -2 0.7 f c f c 0.3 0.3 0.7 -1 +2 -1 -2
Poker Research Background ● What type of strategy profile do we want? – Nash equilibrium ● Example: Rock-Paper-Scissors
Poker Research Background 1 r p s 2 2 2 r p s r p s r p s 0 -1 +1 +1 0 -1 -1 +1 0
Poker Research Background ● A Nash equilibrium strategy profile for Rock-Paper-Scissors. – “No one can change their strategy and do better.” 1 1/3 r p s 1/3 1/3 2 2 2 1/3 r p s 1/3 1/3 r p s 1/3 1/3 r p s 1/3 1/3 1/3 1/3 0 -1 +1 +1 0 -1 -1 +1 0
Poker Research Background ● A Nash equilibrium in a 2-player game is a defensive strategy: – “I can't lose no matter what my opponent does.” 1 1/3 r p s 1/3 1/3 2 2 2 ? r p s ? ? r p s ? ? r p s ? ? ? ? 0 -1 +1 +1 0 -1 -1 +1 0
Poker Research Background Extensive-Form Game ? Nash Equilibrium Strategy Profile
Poker Research Background ● Use minimax (alpha-beta) search to compute Nash? Source: clker.com
Poker Research Background ● Use minimax (alpha-beta) search to compute Nash? c QJ QK 0.5 0.5 1 1 c b c b 0.6 0.4 0.6 0.4 2 2 2 2 0.8 c b 0.2 1 f c 0 0 c b 1 0 f c 1 +1 1 +1 +2 -1 -1 1 +1 -2 0.7 0.3 f c f c 0.7 0.3 -1 +2 -1 -2
Poker Research Background ● Instead, use Counterfactual Regret Minimization (CFR) [Zinkevich et al. , NIPS 2007]. Strategy Deal “Play” Cards Profile 1 Poker
Poker Research Background ● Instead, use Counterfactual Regret Minimization (CFR) [Zinkevich et al. , NIPS 2007]. Strategy Deal “Play” Cards Profile 1 Poker Deal Strategy Cards “Play” T Profile 2 Poker (billions) ... ... Deal Strategy Cards “Play” Profile T Poker
Poker Research Background ● Instead, use Counterfactual Regret Minimization (CFR) [Zinkevich et al. , NIPS 2007]. Strategy 1 + Strategy 2 + ... + Strategy T T ∞ Nash Equilibrium Strategy Profile T = Average Strategy Profile
Poker Research Background Extensive-Form Game CFR Nash Equilibrium Strategy Profile
Poker Research Background ● Huge problem (no pun intended): Texas Hold'em >10 14 CFR CFR Nash Equilibrium > 5 million GB Strategy Profile
Poker Research Background Extensive-Form Game ? Nash Equilibrium Strategy Profile
Poker Research Background Abstract Extensive-Form Game Game
Poker Research Background ● Merge card deals into buckets. Abstract Extensive-Form Game Game
Poker Research Background ● Merge card deals into buckets. Abstract Extensive-Form Game Game
Poker Research Background Abstract Extensive-Form Game Game ≈10 9 >10 14
Poker Research Background Abstract Extensive-Form Game Game ≈10 9 >10 14 CFR Abstract Game Abstract Equilibrium Strategy Deal “Play” Strategy Profile Buckets “Poker” billions of times
Poker Research Background Abstract Extensive-Form Game Game ≈10 9 >10 14 Abstract Game Approximate Full Game Equilibrium Strategy Equilibrium Strategy ≈100 GB
Outline of Presentation ● Computer Poker Primer – Motivation – Background ● New Contributions to Computer Poker – Research + Hyperborean3p ● Future Research – AI in Video Games – StarCraft, ALE, automated content generation ● Teaching Interests – Game design, AI in video games
2 f c +1 +2 Contribution 1: Domination
Domination 3-or-more Player Abstract Game CFR ? (Not equilibrium)
Domination Annual Computer Poker Competition 3-Player Limit Texas Hold'em - 2009 Agent Total Bankroll (mbb/g) Hyperborean3p 319 ± 2 3-or-more dpp 171 ± 2 Player Abstract akuma 151 ± 2 Game CMURingLimit -37 ± 2 dcu3pl -63 ± 2 Bluechip -548 ± 2 CFR ? (Not equilibrium)
Domination c QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c +1 1 +1 +2 -1 -1 1 +1 -2 f c f c -1 +2 -1 -2
Domination c QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c +1 1 +1 +2 -1 -1 1 +1 -2 f c f c -1 +2 -1 -2
Domination c QJ QK 0.5 0.5 1 1 Dominated Strategies c b c b 2 2 2 2 c b f c c b f c +1 1 +1 +2 -1 -1 1 +1 -2 f c f c -1 +2 -1 -2
Domination c QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c b c +1 1 +1 -1 -1 1 -2 f c f c -1 +2 -1 -2
Domination c QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c b c +1 1 +1 -1 -1 1 -2 f c f c -1 +2 -1 -2 Iteratively Dominated Strategy
Domination 3-or-more Player Abstract Game CFR Average Strategy Profile T ∞ No Iteratively New! Dominated Strategies [G., submitted to EC 2013]
Domination 3-or-more 3-or-more Player Abstract Player Abstract Game Game CFR CFR Average “Current” Strategy Profile Strategy Profile T T ∞ Finite T No Iteratively No Iteratively New! Dominated Strategies Dominated Strategies New! [G., submitted to EC 2013]
Domination 3-Player Limit Texas Hold'em - 2012 New! [G., submitted to EC 2013]
Contribution 2: Strategy Stitching
Strategy Stitching ≈10 9 ≈10 14 Abstract 2-player Limit Game Texas Hold'em ≈ 59,000,000 540,000 “Turn” “Turn” Deals Buckets
Strategy Stitching ≈10 9 ≈10 14 Abstract 2-player Limit Game Texas Hold'em ≈ 59,000,000 540,000 “Turn” “Turn” Deals Buckets ≈10 9 Abstract ≈10 17 3-player Limit Game Texas Hold'em ≈ 59,000,000 540 “Turn” Buckets “Turn” Deals
Recommend
More recommend