Low-Variance and Zero-Variance Baselines in Extensive-Form Games - PowerPoint PPT Presentation

Jan 31, 2024 •229 likes •441 views

1 2 Low-Variance and Zero-Variance Baselines in Extensive-Form Games Trevor Davis 2,* , Martin Schmid 1 , Michael Bowling 1,2 *Work done during internship at DeepMind Monte Carlo game solving Extensive-form games (EFGs) Monte Carlo game

1 2 Low-Variance and Zero-Variance Baselines in Extensive-Form Games Trevor Davis 2,* , Martin Schmid 1 , Michael Bowling 1,2 *Work done during internship at DeepMind
Monte Carlo game solving Extensive-form games (EFGs)
Monte Carlo game solving Extensive-form games (EFGs)
Baseline functions - evaluating unsampled actions
Our Contribution VR-MCCFR This work (Schmid et al., AAAI 2019) Lower variance, faster convergence ● Provable zero-variance samples ●
Monte carlo evaluation Unbiased updates at h
Monte Carlo evaluation Unbiased updates at h where
Monte Carlo evaluation Unbiased updates at h where Unsampled actions:
Baseline functions
Evaluation with baseline Without baseline:
Evaluation with baseline Without baseline: Baseline correction:
Evaluation with baseline Without baseline: Baseline correction: (control variate)
Theoretical results Theorem 1: baseline-corrected values are unbiased: Theorem 2: each baseline-corrected value has variance bounded by a sum of squared prediction errors in the subtree rooted at a
Baseline function selection We want Learned history baseline: We know Set to average of previous samples
Baseline function selection We want Learned history baseline: We know Set to average of previous samples Note: depends on strategies - not stationary ∴ is not an unbiased estimate of current expectation still unbiased
Baseline convergence evaluation Leduc poker, Monte Carlo Counterfactual Regret Minimization (MCCFR+) No baseline VR-MCCFR (Schmid et al.) Learned history baseline
Predictive baseline Updating with learned history baseline: Optimal baseline depends on strategy update:
Predictive baseline Updating with learned history baseline: Use strategy to update baseline: Optimal baseline depends on strategy update: Recursively set
Zero-variance updates If: We use the predictive baseline ● We sample public outcomes ● All outcomes are sampled at least once ● Theorem: the baseline-corrected values have zero variance
Baseline variance evaluation Leduc poker, Monte Carlo Counterfactual Regret Minimization (MCCFR+) No baseline VR-MCCFR (Schmid et al.) Learned history baseline Predictive baseline
Conclusion Lower variance, faster convergence ● Provable zero-variance samples ●

Recommend

BASELINES VPUU and CeaseFire BASELINES VPUU in Hanover Park At this point in Time 1: What are the

VPUU and CeaseFire BASELINES VPUU and CeaseFire BASELINES VPUU in Hanover Park At this point in Time 1: What are the successes? The VPUU baseline survey is complete. Stakeholder audit and interviews have been concluded and VPUU is ready to form

439 views • 21 slides

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X is: var( X ) = E [( X E X ) 2 ] Alternatively, (check using linearity of expectation), var( X ) = E [ X 2 ] ( E X ) 2 Variance Variance is

395 views • 12 slides

Zero Waste at The Nat Zero Waste Zero Waste Zero Waste is a philosophy that encourages the

Zero Waste at The Nat Zero Waste Zero Waste Zero Waste is a philosophy that encourages the redesign of resource life cycles so that all products are reused. Goal: No trash is sent to landfills and incinerators The process recommended

241 views • 22 slides

Getting to Zero San Francisco Consortium Zero new HIV infections Zero HIV deaths Zero stigma

Getting to Zero San Francisco Consortium Zero new HIV infections Zero HIV deaths Zero stigma and discrimination Agenda 1. Welcome & Acknowledgements 2. Panel & Community Discussion: Ending the Epidemic: A Holistic Approach -

2.96k views • 22 slides

Getting to Zero San Francisco Consortium Zero new HIV infections Zero HIV deaths Zero stigma

Getting to Zero San Francisco Consortium Zero new HIV infections Zero HIV deaths Zero stigma and discrimination Agenda 1. Welcome, Acknowledgements & Overview 2. In Memoriam 3. Policy & Legislative Update 4. GTZ-SF looking back to

862 views • 52 slides

BASELINES FOR POIN INT AND NONPOINT SOURCES GENERATING CREDITS IN IN THE CHESAPEAKE BAY

ESTABLISHING OFFSET AND TRADING BASELINES FOR POIN INT AND NONPOINT SOURCES GENERATING CREDITS IN IN THE CHESAPEAKE BAY WATERSHED DRAFT PROPOSAL Prepared by EPA Region III January 21, 2015 IN INTRODUCTION Baselines used to generate

529 views • 10 slides

Uncertainties over the Starting Line? Challenges in the Definition of Territorial Sea Baselines

Uncertainties over the Starting Line? Challenges in the Definition of Territorial Sea Baselines Professor Clive Schofield Director of Research Australian National Centre for Ocean Resources and Security Baselines depend on sovereignty over

464 views • 28 slides

Establishment of baselines for tracking global trends in SDG indicators 30 March 1 April 2016

Third Meeting of the Inter-agency and Expert Group on Sustainable Development Goal Indicators Establishment of baselines for tracking global trends in SDG indicators 30 March 1 April 2016 Mexico City, Mexico Background Baselines should

288 views • 5 slides

Baselines for Retail Demand Response Programs Bruce Kaneshiro California Public Utilities

Baselines for Retail Demand Response Programs Bruce Kaneshiro California Public Utilities Commission March 12, 2009 Contact Info: bsk@cpuc.ca.gov Purpose of Baselines in Demand Response What is the Baseline? An hourly

116 views • 11 slides

Extensive-stage small cell lung cancer Tom Stinchcombe Duke Thoracic Oncology Program Extensive

Extensive-stage small cell lung cancer Tom Stinchcombe Duke Thoracic Oncology Program Extensive stage (ES) small cell lung cancer Patient with SCLC who presented with extensive-stage disease and received carboplatin and etoposide as

369 views • 18 slides

Introduction to Game Theory Mehdi Dastani BBL-521 M.M.Dastani@uu.nl Extensive Games

Introduction to Game Theory Mehdi Dastani BBL-521 M.M.Dastani@uu.nl Extensive Games Sequential Structure of Games. Perfect and Imperfect-Information Extensive Games. Strategies and Equilibria for Extensive Games. Extensive Games

531 views • 28 slides

Consortium Zero new HIV infections Zero HIV deaths Zero stigma and discrimination Agenda 1.

Getting to Zero San Francisco Consortium Zero new HIV infections Zero HIV deaths Zero stigma and discrimination Agenda 1. Welcome 2. Policy Updates, Q&A 3. GTZ @ Pride & AIDS Walk 4. Panel & Community Discussion: Exploring

394 views • 15 slides

DALLAS ZERO WASTE Recycling 101 ZERO WASTE PLAN What is Zero Waste? The planet has limited

DALLAS ZERO WASTE Recycling 101 ZERO WASTE PLAN What is Zero Waste? The planet has limited natural resources and we must do our part! Zero Waste means a future where landfills are a thing of the past. Reduce, Reuse, Recycle!

707 views • 28 slides

VISION ZERO SF: ELIMINATING TRAFFIC DEATHS BY 2024 FEBRUARY 6, 2017 VISION ZERO VISION ZERO SF

Through Vision Zero SF we commit to working together to prioritize street safety and eliminate traffic deaths in San Francisco by 2024 VISION ZERO SF: ELIMINATING TRAFFIC DEATHS BY 2024 FEBRUARY 6, 2017 VISION ZERO VISION ZERO SF 101 Create

198 views • 9 slides

Presentation of Platform Zero Incidents Platform Zero Incidents Platform Zero Incidents MENTAL

Maurits van der Linde Presentation of Platform Zero Incidents Platform Zero Incidents Platform Zero Incidents MENTAL SHIFT ZERO INCIDENTS CLIMATE The experts of transporting (dangerous) goods via inland waterways are on board ! INDIVIDUAL AND

299 views • 16 slides

Zero-knowledge Arguments Proving circuit satisfaibility in zero-knowledge Zero-knowledge In

Short Pairing-based Non-interactive Zero-knowledge Arguments Proving circuit satisfaibility in zero-knowledge Zero-knowledge In a zero knowledge protocol a prover can convince a verifier that some statement is true without leaking any side

629 views • 14 slides

IR to RIMS Transforming an Institutional Repository Into a Research Information Management

IR to RIMS Transforming an Institutional Repository Into a Research Information Management System Craig Feldman & Darryl Meyer Supervised by Prof Hussein Suleman 1 Introduction (1) Migrate two NRF databases to DSpace Current and

515 views • 24 slides

r s t

r s t sr P rstt tt

813 views • 16 slides

Is this the future? BALEAP PIM : Blending technology with EAP University of Southampton

eFeedback & eMarking of Written Assignments with Grademark Is this the future? BALEAP PIM : Blending technology with EAP University of Southampton November 10th 2012 Garry Maguire gmaguire@brookes.ac.uk Abstr Abstract act: The stage

374 views • 22 slides

Public Sector Service Transforma4on Julie Grant & Nadia

Public Sector Service Transforma4on Julie Grant & Nadia Dajani Agenda Introduc4ons About the Barrington Consul4ng Group Case Study: Access

449 views • 24 slides

Game Theory for Sequential Interactions CMPUT 366: Intelligent Systems S&LB 5.0-5.2.2

Game Theory for Sequential Interactions CMPUT 366: Intelligent Systems S&LB 5.0-5.2.2 Lecture Outline 1. Recap 2. Perfect Information Games 3. Backward Induction 4. Imperfect Information Games Recap: Game Theory Game

426 views • 21 slides

Game Dynamics in Extensive Form Dietmar Berwanger LSV, ENS Cachan & CNRS ICLA - Logic &

Game Dynamics in Extensive Form Dietmar Berwanger LSV, ENS Cachan & CNRS ICLA - Logic & Social Interaction, 2009 Dietmar Berwanger (CNRS) Game Dynamics Logic & Social Interaction 1 / 13 Evolution in repeated games 0,1 2,1 1,2

634 views • 46 slides

Substructural modal logic for optimality and games Gabrielle Anderson University College London

Substructural modal logic for optimality and games Gabrielle Anderson University College London (Joint work with David Pym) Resource Reasoning Wednesday 13th January, 2016 Overview Focus: logical characterisations of notions of

327 views • 19 slides

Single agent or multiple agents Many domains are characterized by multiple agents rather than a

Single agent or multiple agents Many domains are characterized by multiple agents rather than a single agent. Game theory studies what agents should do in a multi-agent setting. Agents can be cooperative, competitive or somewhere in between.

883 views • 43 slides

Low-Variance and Zero-Variance Baselines in Extensive-Form Games - PowerPoint PPT Presentation

1 2 Low-Variance and Zero-Variance Baselines in Extensive-Form Games Trevor Davis 2,* , Martin Schmid 1 , Michael Bowling 1,2 *Work done during internship at DeepMind Monte Carlo game solving Extensive-form games (EFGs) Monte Carlo game

BASELINES VPUU and CeaseFire BASELINES VPUU in Hanover Park At this point in Time 1: What are the

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Zero Waste at The Nat Zero Waste Zero Waste Zero Waste is a philosophy that encourages the

Getting to Zero San Francisco Consortium Zero new HIV infections Zero HIV deaths Zero stigma

Getting to Zero San Francisco Consortium Zero new HIV infections Zero HIV deaths Zero stigma

BASELINES FOR POIN INT AND NONPOINT SOURCES GENERATING CREDITS IN IN THE CHESAPEAKE BAY

Uncertainties over the Starting Line? Challenges in the Definition of Territorial Sea Baselines

Establishment of baselines for tracking global trends in SDG indicators 30 March 1 April 2016

Baselines for Retail Demand Response Programs Bruce Kaneshiro California Public Utilities

Extensive-stage small cell lung cancer Tom Stinchcombe Duke Thoracic Oncology Program Extensive

Introduction to Game Theory Mehdi Dastani BBL-521 M.M.Dastani@uu.nl Extensive Games

Consortium Zero new HIV infections Zero HIV deaths Zero stigma and discrimination Agenda 1.

DALLAS ZERO WASTE Recycling 101 ZERO WASTE PLAN What is Zero Waste? The planet has limited

VISION ZERO SF: ELIMINATING TRAFFIC DEATHS BY 2024 FEBRUARY 6, 2017 VISION ZERO VISION ZERO SF

Presentation of Platform Zero Incidents Platform Zero Incidents Platform Zero Incidents MENTAL

Zero-knowledge Arguments Proving circuit satisfaibility in zero-knowledge Zero-knowledge In

IR to RIMS Transforming an Institutional Repository Into a Research Information Management

r s t

Is this the future? BALEAP PIM : Blending technology with EAP University of Southampton

Public Sector Service Transforma4on Julie Grant &amp; Nadia

Game Theory for Sequential Interactions CMPUT 366: Intelligent Systems S&amp;LB 5.0-5.2.2

Game Dynamics in Extensive Form Dietmar Berwanger LSV, ENS Cachan &amp; CNRS ICLA - Logic &amp;

Substructural modal logic for optimality and games Gabrielle Anderson University College London

Single agent or multiple agents Many domains are characterized by multiple agents rather than a

Public Sector Service Transforma4on Julie Grant & Nadia

Game Theory for Sequential Interactions CMPUT 366: Intelligent Systems S&LB 5.0-5.2.2

Game Dynamics in Extensive Form Dietmar Berwanger LSV, ENS Cachan & CNRS ICLA - Logic &