Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam - PowerPoint PPT Presentation

Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam Lerer* 1 , Sam Gross 1 , Tuomas Sandholm 23 *Equal Contribution 1 Facebook AI Research 2 Carnegie Mellon University 3 Strategic Machine Inc., Strategy Robot Inc., and Optimized Markets Inc.

Counterfactual Regret Min inimization (C (CFR) [Zin inkevich et al al. Neu eurIP IPS-07] 07] • CFR is the leading algorithm for solving partially observable games • Iteratively converges to an equilibrium • Used by every top poker AI in the past 7 years, including Libratus • Every single one used a tabular form of CFR • This paper introduces a function approximation form of CFR using deep neural networks • Less domain knowledge • Easier to apply to other games

Example of Monte Carlo CFR [La Lanctot et al al. NeurIP rIPS-09] 09] • Simulate a game with one player designated as the traverser C 𝑄 1 𝑄 2 C 25

Example of Monte Carlo CFR [La Lanctot et al al. NeurIP rIPS-09] 09] • Simulate a game with one player designated as the traverser C • After game ends, traverser sees how much better she could have done by 𝑄 1 choosing other actions 𝑄 2 𝑄 2 C 𝑄 1 25 100

Example of Monte Carlo CFR [La Lanctot et al al. NeurIP rIPS-09] 09] • Simulate a game with one player designated as the traverser C • After game ends, traverser sees how much better she could have done by 𝑄 1 choosing other actions • This difference is added to the 𝑄 2 𝑄 2 action’s regret . In future iterations, actions with higher regret are chosen C 𝑄 1 with higher probability 25 100

Example of Monte Carlo CFR [La Lanctot et al al. NeurIP rIPS-09] 09] • Simulate a game with one player designated as the traverser C • After game ends, traverser sees how much better she could have done by 𝑄 1 choosing other actions • This difference is added to the 𝑄 2 𝑄 2 action’s regret . In future iterations, actions with higher regret are chosen C 𝑄 1 with higher probability • Process repeats even for 50 120 25 100 hypothetical decision points

Prior Approach: Abstraction in Games Original game Bucketed together Abstracted game • Requires extensive domain knowledge • Several papers written on how to do abstraction just in poker • Difficult to extend to other games

Deep CFR • Input: low-level features (visible cards, observed actions) • Output: estimate of action regrets • On each iteration: 1. Collect samples of action regrets, add to a buffer 2. Train a network to predict regrets 3. Use network’s regret estimates to play on next iteration

Deep CFR • Input: low-level features (visible cards, observed actions) • Output: estimate of action regrets • On each iteration: 1. Collect samples of action regrets, add to a buffer 2. Train a network to predict regrets 3. Use network’s regret estimates to play on next iteration • Theorem: With arbitrarily high probability, Deep CFR converges to an 𝜗 -Nash equilibrium in two-player zero-sum games, where 𝜗 is determined by prediction error

Experimental results in limit Texas hold’em • Deep CFR produces superhuman performance in heads-up limit Texas hold’em poker • ~10 trillion decision points • Once played competitively by humans • Deep CFR outperforms Neural Fictitious Self Play (NFSP), the prior best deep RL algorithm for partially observable games [Heinrich & Silver arXiv-15] • Deep CFR is also much more sample efficient • Deep CFR is competitive with domain-specific abstraction algorithms

Conclusions • Among algorithms for non-tabular solving of partially-observable games, Deep CFR is the fastest, most sample-efficient, and produces the best results • Uses less domain knowledge than abstraction-based approaches, making it easier to apply to other games

Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam - PowerPoint PPT Presentation

Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam Lerer* 1 , Sam Gross 1 , Tuomas Sandholm 23 *Equal Contribution 1 Facebook AI Research 2 Carnegie Mellon University 3 Strategic Machine Inc., Strategy Robot Inc., and Optimized

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Counterfactual Regret Minimization Gabriele Farina 1 Christian Kroer 2 Noam Brown 1 Tuomas Sandholm

TAX MIN INIMIZATION IN IN MERGERS & ACQUISITIONS Harold F. Ingersoll, CPA/ABV/CFF, CVA,

Counterfactual Donkeys and the Modal Horizon Andreas Walker and Maribel Romero Counterfactual

Counterfactual Policy Evaluation in Reproducing Kernel Hilbert Spaces Krikamol Muandet Max

U G A V ! Michael Johanson, Nolan Bard, Marc Lanctot, " # ! K Q $ A J Richard

Clo loud-based Collision-Aware Energy- Min inimization Vehicle Velocity Optimization Chenxi

Counterfactual-based mediation analysis Workshop 2 Rhian Daniel London School of Hygiene and

Counterfactual-based mediation analysis Workshop 1 Rhian Daniel London School of Hygiene and

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

Composability of Regret Minimizers Gabriele Farina 1 Christian Kroer 2 Tuomas Sandholm 1,3,4,5 1

Our Virtual Welcome Welcome We want to welcome you to this presentation. Whilst we regret we are

A Closer Look at Adaptive Regret Dmitry Adamskiy Joint work with Wouter Koolen, Volodya Vovk and

The Actual, the Counterfactual and the Possible An Oceanic-centric approach to tense and modality

Royal Economic Society The history of Regret Theory Robert Sugden Contribution to Economic

A Review of Counterfactual thinking and the first instinct fallacy by Kruger, Wirtz, and

Regret Bounds for Lifelong Learning Pierre Alquier Groupe de Travail de Machine learning du CMLA

Regret bounds for online variational inference Pierre Alquier ACML Nagoya, Nov. 18, 2019

Independent and conditionally independent counterfactual distributions Marcin Wolski European

Fair Classification with Counterfactual Learning Dr. Maryam Tavakol 1/15 What is Fairness 2/15

Reliable Decision Support using Counterfactual Models Suchi Saria Assistant Professor

The Choice of Effect Measure for Binary Outcomes: Introducing Counterfactual Outcome State

Single World Intervention Graphs (SWIGs): Unifying the Counterfactual and Graphical Approaches to

Trade in Jobs: a counterfactual exercise Robert Stehrer & Roman Stllinger The Vienna

Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam - PowerPoint PPT Presentation

Deep Counterfactual Regret Min inimization Noam Brown* 12 , Adam Lerer* 1 , Sam Gross 1 , Tuomas Sandholm 23 *Equal Contribution 1 Facebook AI Research 2 Carnegie Mellon University 3 Strategic Machine Inc., Strategy Robot Inc., and Optimized

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Counterfactual Regret Minimization Gabriele Farina 1 Christian Kroer 2 Noam Brown 1 Tuomas Sandholm

TAX MIN INIMIZATION IN IN MERGERS &amp; ACQUISITIONS Harold F. Ingersoll, CPA/ABV/CFF, CVA,

Counterfactual Donkeys and the Modal Horizon Andreas Walker and Maribel Romero Counterfactual

Counterfactual Policy Evaluation in Reproducing Kernel Hilbert Spaces Krikamol Muandet Max

U G A V ! Michael Johanson, Nolan Bard, Marc Lanctot, &quot; # ! K Q $ A J Richard

Clo loud-based Collision-Aware Energy- Min inimization Vehicle Velocity Optimization Chenxi

Counterfactual-based mediation analysis Workshop 2 Rhian Daniel London School of Hygiene and

Counterfactual-based mediation analysis Workshop 1 Rhian Daniel London School of Hygiene and

No-Regret Learning in Convex Games Geoff Gordon, Amy Greenwald, Casey Marks, and Martin Zinkevich

Composability of Regret Minimizers Gabriele Farina 1 Christian Kroer 2 Tuomas Sandholm 1,3,4,5 1

Our Virtual Welcome Welcome We want to welcome you to this presentation. Whilst we regret we are

A Closer Look at Adaptive Regret Dmitry Adamskiy Joint work with Wouter Koolen, Volodya Vovk and

The Actual, the Counterfactual and the Possible An Oceanic-centric approach to tense and modality

Royal Economic Society The history of Regret Theory Robert Sugden Contribution to Economic

A Review of Counterfactual thinking and the first instinct fallacy by Kruger, Wirtz, and

Regret Bounds for Lifelong Learning Pierre Alquier Groupe de Travail de Machine learning du CMLA

Regret bounds for online variational inference Pierre Alquier ACML Nagoya, Nov. 18, 2019

Independent and conditionally independent counterfactual distributions Marcin Wolski European

Fair Classification with Counterfactual Learning Dr. Maryam Tavakol 1/15 What is Fairness 2/15

Reliable Decision Support using Counterfactual Models Suchi Saria Assistant Professor

The Choice of Effect Measure for Binary Outcomes: Introducing Counterfactual Outcome State

Single World Intervention Graphs (SWIGs): Unifying the Counterfactual and Graphical Approaches to

Trade in Jobs: a counterfactual exercise Robert Stehrer &amp; Roman Stllinger The Vienna

TAX MIN INIMIZATION IN IN MERGERS & ACQUISITIONS Harold F. Ingersoll, CPA/ABV/CFF, CVA,

U G A V ! Michael Johanson, Nolan Bard, Marc Lanctot, " # ! K Q $ A J Richard

Trade in Jobs: a counterfactual exercise Robert Stehrer & Roman Stllinger The Vienna