Finding Optimal Abstract Strategies in Extensive-Form Games Mike Johanson, Nolan Bard, Q J $ # K 1 0 P C R " ! Neil Burch, Michael Bowling U G A V ! " University of Alberta, Canada # ! K Q $ A J ! 0 July 25 th , 2012 :: AAAI 2012, Toronto 1 University of Alberta Computer Poker Research Group Tuesday, November 13, 2012
2-Player Limit Texas Hold’em Poker: Distance from Perfect Play 400 Exploitability (mbb/g) 300 200 AAAI 2007 Vancouver: Narrow loss Las Vegas: to Human Pros 100 Narrow win (275) 104 over Human Pros (235) 0 2006 2007 2008 2009 2010 2011 Year Tuesday, November 13, 2012
Abstraction-Solving-Translation Game Goal: 10 14 decisions We want to learn a strategy σ (or, in RL, a policy π ) that chooses actions. Solver Exploitability : Expected loss against a perfect adversary. Optimal Nash Equilibrium : Strategy Unexploitable - expected loss of $0 per game. An optimal strategy . We want to approximate this. Tuesday, November 13, 2012
Abstraction-Solving-Translation Game 10 14 decisions Problem: The game has 10 14 information sets. Far too large to solve! Solver With current techniques, this would take 4 petabytes of RAM and thousands of CPU-years! Optimal Strategy Tuesday, November 13, 2012
Abstraction-Solving-Translation Game 10 14 decisions Problem: The game has 10 14 information sets. Far too large to solve! Solver With current techniques, this would take 4 petabytes of RAM and thousands of CPU-years! Optimal Strategy If you have four petabytes of RAM, we should talk! Tuesday, November 13, 2012
Abstraction-Solving-Translation Abstraction Game Abstract Game Workaround: 10 14 decisions 10 7 decisions Use state-space abstraction to make a smaller game that we can solve. Solver Optimal Strategy Tuesday, November 13, 2012
Abstraction-Solving-Translation Abstraction Game Abstract Game Solving : 10 14 decisions 10 7 decisions Use a game-solving algorithm to find an optimal strategy for the abstract game. Solver Solver Optimal Optimal Strategy Abstract Strategy Tuesday, November 13, 2012
Abstraction-Solving-Translation Abstraction Game Abstract Game Solving : 10 14 decisions 10 7 decisions Use a game-solving algorithm to find an optimal strategy for the abstract game. Solver Solver Optimal Translation Strategy Abstract Strategy Tuesday, November 13, 2012
Abstraction-Solving-Translation Abstraction Game Abstract Game Two Types of 10 14 decisions 10 7 decisions Loss: Lossy abstraction . May not be possible to Solver Solver represent an optimal strategy. ≠ Other abstract Optimal Optimal strategies might Strategy Abstract Strategy be better in the real game! Tuesday, November 13, 2012
Abstract Equilibrium might not be optimal in the real game. Set of Set of Abstract Strategies Strategies Abstract Optimal Strategy Exploitability Real Optimal Strategy Tuesday, November 13, 2012
Abstract Equilibrium might not be optimal in the real game. Set of Set of Abstract Strategies Strategies Abstract Optimal Strategy Least Exploitable Abstract Strategy Exploitability Real Optimal Strategy Tuesday, November 13, 2012
Abstraction-Solving-Translation Abstraction Game Abstract Game 10 14 decisions 10 7 decisions This Talk: Solver Efficiently finding an Solver Solver abstract strategy with the lowest exploitability in the real game. Optimal Least Exploitable Optimal Strategy Abstract Strategy Abstract Strategy Tuesday, November 13, 2012
Counterfactual Regret Minimization (CFR) NIPS 2007 vs σ 0 = uniform random t=0 Tuesday, November 13, 2012
Counterfactual Regret Minimization (CFR) NIPS 2007 vs “Play a game”, σ 0 σ 1 Updating with CFR makes them regret-minimizing agents. Update using CFR t=0 1 Tuesday, November 13, 2012
Counterfactual Regret Minimization (CFR) NIPS 2007 vs σ 0 σ 1 σ 2 The “Current” strategy t=0 1 2 The “Average” σ 0 + σ 1 + ... + σ t σ = ^ strategy t Tuesday, November 13, 2012
Counterfactual Regret Minimization (CFR) NIPS 2007 vs σ 0 σ 1 σ 2 σ 3 σ 4 σ T t=0 1 2 3 4 T If both players are regret-minimizing , Key Theorem: then their average strategy converges towards an optimal strategy. Tuesday, November 13, 2012
Counterfactual Regret Minimization (CFR) NIPS 2007 CFR in an abstract 10-Bucket Perfect Recall Game 10 2 Abstract Game 10 1 Abstract Game Exploitability 10 0 10 -1 10 3 10 4 10 5 10 6 10 7 CFR Iterations Tuesday, November 13, 2012
Counterfactual Regret Minimization (CFR) NIPS 2007 CFR in an abstract 10-Bucket Perfect Recall Game 10 2 Abstract Game 340 Real Game 320 10 1 Real Game Abstract Game Exploitability Exploitability 300 10 0 280 260 10 -1 10 3 10 4 10 5 10 6 10 7 CFR Iterations Tuesday, November 13, 2012
Moving from CFR to CFR-BR in six easy steps. Tuesday, November 13, 2012
1 Both players abstracted. vs Abstracted, Abstracted, CFR CFR X GB RAM X GB RAM Both players are abstracted. Computation is efficient, Solution is suboptimal. X is typically 1 to 100, depending on size of abstraction. Tuesday, November 13, 2012
2 Opponent is unabstracted. vs Abstracted, Unabstracted, CFR CFR 140 TB RAM 100 GB RAM [Waugh et al., 2009] : Opponent is unabstracted. Abstracted player minimizes exploitability! Requires far too much RAM and computation. Tuesday, November 13, 2012
3 Play against a Best Response. vs Abstracted, Unabstracted, CFR Best Response 8.75 TB RAM 100 GB RAM A Best Response is also regret-minimizing , so average CFR strategy converges. Current CFR strategy converges, too! Takes 76 CPU-days to compute a BR. Tuesday, November 13, 2012
4 Split Best Response into pieces. 59 MB Rounds 1 and 2 BR Trunk Rounds 3 and 4 BR Unabstracted, Best Response Subgame 3 MB Split strategy into a Trunk and many Subgames . Big advantage of Best Response: Can compute subgames independently as needed! Never need to store all of it at once! Tuesday, November 13, 2012
4 Split Best Response into pieces. Rounds 1 and 2 BR Trunk vs Rounds 3 and 4 Abstracted, BR CFR Subgame 59+3 = 62 MB RAM 100 GB RAM Compute subgames as needed, then discard. Memory problem solved! Takes 2x76 CPU- days, though: first pass to compute Trunk, second pass to play the game. Tuesday, November 13, 2012
5 Play against a CFR-BR Hybrid. Rounds 1 and 2 CFR Trunk vs Rounds 3 and 4 Abstracted, BR CFR Subgame 936+3 = 940 MB RAM 100 GB RAM Use CFR to update Trunk strategy. This is also regret-minimizing , so CFR converges. Can query Trunk strategy any time, and compute Subgame strategy as needed. Tuesday, November 13, 2012
6 Use Sampling to converge faster. Rounds 1 and 2 CFR Trunk vs Rounds 3 and 4 Abstracted, BR CFR Subgame 940 MB RAM 100 GB RAM Sample one subgame, compute BR, update players. Takes 50 CPU-seconds per iteration and 940 MB RAM , and still converges! Tuesday, November 13, 2012
CFR Trunk vs Abstracted, BR CFR Subgame 940 MB RAM 100 GB RAM CFR-BR : Finds the least exploitable abstract strategy, while using less RAM than CFR did! Average Strategy: Guaranteed to converge. Current Strategy: Not guaranteed, but converges faster in practice. Tuesday, November 13, 2012
Testing in a small poker game Unabstracted [2-4] Hold’em Poker: 94 million information sets 10 3 Exploitability (mbb/g) CFR CFR-BR Average CFR-BR Current 10 2 10 1 10 0 10 -1 10 2 10 3 10 4 10 5 10 6 10 7 Time (CPU-seconds) Tuesday, November 13, 2012
Testing in a small poker game Abstracted [2-4] Hold’em: 1790 information sets 10 3 Exploitability (mbb/g) CFR A-vs-A CFR A-vs-U CFR-BR Average CFR-BR Current 143.932 10 2 81.332 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Time (CPU-seconds) Tuesday, November 13, 2012
Texas Hold’em Poker: Small Abstractions 2007 Computer Poker Competition Abstraction 57 million information sets (Previous best strategy: 100x larger abstraction, exploitable for 104) 10 3 Exploitability (mbb/g) 305.045 10 2 92.638 CFR CFR-BR Avg CFR-BR Cur 10 1 10 4 10 5 10 6 10 7 10 8 10 9 Time (CPU-seconds) Tuesday, November 13, 2012
Texas Hold’em Poker: Tiny Abstractions 2-Bucket and 3-Bucket Abstractions: These fit on a 1.44 MB Floppy Disk ! (2008 Man-vs-Machine Winner: 1.25 GB , exploitable for 235) 10 3 Exploitability (mbb/g) 2-Bucket CFR-BR Average 3-Bucket CFR-BR Average 218.487 175.824 10 2 10 5 10 6 10 7 10 8 Time (CPU-seconds) Tuesday, November 13, 2012
Texas Hold’em Poker: Small Abstractions Least Exploitable Strategy Ever Made: 5.8 Billion information sets 10 3 Exploitability (mbb/g) Previous Best Strategy, Same Abstraction: 104 10 2 53.7929 37.170 Hyperborean 2011.IRO CFR-BR Average CFR-BR Current 10 1 10 6 10 7 10 8 10 9 Time (CPU-seconds) Tuesday, November 13, 2012
2-Player Limit Texas Hold’em Poker: Distance from Perfect Play 400 Exploitability (mbb/g) 300 Narrow loss 200 to Human Pros Narrow win over Human Pros 100 This Talk: CFR-BR 0 2006 2007 2008 2009 2010 2011 2012 Year Tuesday, November 13, 2012
Recommend
More recommend