Open-ended learning in symmetric zero-sum games David Balduzzi, - PowerPoint PPT Presentation

Oct 26, 2022 •98 likes •218 views

Open-ended learning in symmetric zero-sum games David Balduzzi, Marta Garnelo, Yoram Bachrach, Wojciech M. Czarnecki, Julien Perolat, Max Jaderberg, Thore Graepel Long ago and far away (mid-1800s in Cambridge, England): First tutor: I'm

Open-ended learning in symmetric zero-sum games David Balduzzi, Marta Garnelo, Yoram Bachrach, Wojciech M. Czarnecki, Julien Perolat, Max Jaderberg, Thore Graepel
Long ago and far away (mid-1800s in Cambridge, England): First tutor: “I'm teaching the most brilliant boy in Britain” Second tutor: “Well, I'm teaching the best test-taker” Depending on the version of the story, the first boy was either Lord Kelvin or James Clerk Maxwell . The second boy indeed scored highest on the Mathematical Tripos, but is otherwise long forgotten.
Long ago and far away (mid-1800s in Cambridge, England): First tutor: “I'm teaching the most brilliant boy in Britain” Second tutor: “Well, I'm teaching the best test-taker” Depending on the version of the story, the first boy was either Lord Kelvin or James Clerk Maxwell . The second boy indeed scored highest on the Mathematical Tripos, but is otherwise long forgotten. Modern learning algorithms are outstanding test-takers But intelligence is about more than taking tests It’s also about formulating useful problems
Where do problems come from? Answer #1: Someone packages a dataset into a loss function e.g. ImageNet, CIFAR, MNIST, …
Where do problems come from? Answer #1: Someone packages a dataset into a loss function e.g. ImageNet, CIFAR, MNIST, … Answer #2: Someone builds a task (that is, an environment sprinkled with rewards) e.g. Arcade Learning Environment, DM-Lab, Open AI gym, …
Where do problems come from? Answer #3: Self-play in symmetric zero-sum games The agent is the task -- create an outer loop that bends deep RL on itself
(Naive) self-play is an open-ended learning algorithm It’s pretty amazing
(Naive) self-play is an open-ended learning algorithm but … there are really simple examples where it completely breaks down It’s not a general purpose learning algorithm, not even for zero-sum games
On the varieties of zero-sum games transitive: “relative skill cyclic: “every strategy determines who wins” has a counter-strategy”
Theorem: Any symmetric two-player zero-sum game decomposes into [ transitive ] + [ cyclic ] components transitive: skill determines outcome cyclic: every strategy has a counter-strategy
The paper: How to formulate useful objectives in non-transitive games New tools: ● Gamescapes (generalize landscapes, but represent many objectives) Population-level performance measures ● ● Population-level training algorithms

Recommend

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg Shah 1 Zero-Sum Games Total reward is constant in all outcomes (w.l.o.g. 0 ) Focus on two-player zero-sum games (2p-zs) The more I

539 views • 27 slides

Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Zero-Sum Games

CSC304 Lecture 6 Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Zero-Sum Games Special case of games Total reward to all players is constant in every outcome Without loss of generality, sum of rewards = 0

310 views • 27 slides

Chapter 2.5 Intermission Zero-Sum Games Zero-Sum Games A game consists of Players: Can

Chapter 2.5 Intermission Zero-Sum Games Zero-Sum Games A game consists of Players: Can be people, companies, states, or even randomness. Moves: Players can make moves (in some order or at the same time) according to the rules

869 views • 10 slides

CS 170 Section 9 Zero-Sum Games, Reductions Owen Jow | owenjow@berkeley.edu Zero-Sum Games

CS 170 Section 9 Zero-Sum Games, Reductions Owen Jow | owenjow@berkeley.edu Zero-Sum Games The professors have been busy lately... Reductions Transform problem P into problem Q Solve problem Q Transform solution for Q into

123 views • 11 slides

Class 42: Free symmetric top Class 42: Free symmetric top Free symmetric top in body frame Assume

Class 42: Free symmetric top Class 42: Free symmetric top Free symmetric top in body frame Assume = = (symmetric) and no external torque acting on the top: Assume 1 = 2 = (symmetric) and no external torque acting on the top:

63 views • 3 slides

Guest Lecture: Prof. Allan Borodin Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 -

CSC304 Lecture 5 Guest Lecture: Prof. Allan Borodin Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Zero-Sum Games Special case of games Total reward to all players is constant in every outcome Without

495 views • 24 slides

Game Theory Preliminaries: Playing and Solving Games Zero-sum games with perfect information

Game Theory Preliminaries: Playing and Solving Games Zero-sum games with perfect information R&N 6 Definitions Game evaluation Optimal solutions Minimax Non-deterministic games (first take) 1 Types of Games

591 views • 31 slides

ex Addition: 1-bit half adder A + Sum B Carry out Carry A B Sum out 0 0 A 0 1 Sum

ex Addition: 1-bit half adder A + Sum B Carry out Carry A B Sum out 0 0 A 0 1 Sum B 1 0 1 1 Carry out Carry in ex Addition: 1-bit full adder A + Sum B Carry out Carry Carry Carry in A B Sum in out 0 0 0 A 0

343 views • 12 slides

Thresholded Rewards: Acting Optimally in Timed, Zero-Sum Games Colin McMillen and Manuela Veloso

Thresholded Rewards: Acting Optimally in Timed, Zero-Sum Games Colin McMillen and Manuela Veloso Presenter: Man Wang Overview Zero-sum Games Markov Decision Problems Value Iteration Algorithm Thresholded Rewards MDP TRMDP

542 views • 25 slides

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person

Today Experts/Zero-Sum Games Equilibrium. Boosting and Experts. Routing and Experts. Two person zero sum games. m n payoff matrix A . Row mixed strategy: x = ( x 1 ,..., x m ) . Column mixed strategy: y = ( y 1 ,..., y n ) . Payoff for

778 views • 30 slides

Open problems in repeated games with finite automata Abraham Neyman Jerusalem, May 23, 2011

Open problems in repeated games with finite automata Abraham Neyman Jerusalem, May 23, 2011 subject, date p. 1/13 Two-person zero-sum games subject, date p. 2/13 Two-person zero-sum games Quantify the advantage of larger automata in

789 views • 57 slides

Non-Zero-Sum Stochastic Differential Games of Controls and Stoppings Qinghua Li October 1, 2009

Bibliography Mathematical Formulation Martingale Interpretation BSDE Approach Multi-Dim Reflective BSDE Non-Zero-Sum Stochastic Differential Games of Controls and Stoppings Qinghua Li October 1, 2009 Qinghua Li Non-Zero-Sum Stochastic

1.09k views • 49 slides

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite Duration Games Win-Lose Games Payoff Games Infinite Duration Games Parity Games Mean Payoff Games Simple Stochastic Games 2 / 19 Outline Finite

2.02k views • 166 slides

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

1 S erious Games + Computer S cience = S erious CS K.Becker & J.R.Parker S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S + Computer S + Computer S cience cience cience cience = S = S

452 views • 14 slides

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Examples Potential Games Potential vs Congestion games Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs Congestion games Examples Potential Games Potential vs Congestion games Examples Potential

512 views • 20 slides

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Octal Games Pre-Grundy Games thks Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene, Antoine Dailly and Urban Larsson Gabrielle Paris 1/26 Octal Games Pre-Grundy Games thks Are Pre-Grundy games

666 views • 55 slides

Director of the Buckingham Lean Enterprise Unit and played extensively at LERC The version

accreditation| certification | community T HE B UCKINGHAM H OUSING R EPAIRS G AME P REVIEW The Buckingham Housing Repairs Game was created by John Bicheno, Director of the Buckingham Lean Enterprise Unit and played extensively at LERC The

740 views • 21 slides

CMU 15-896 Noncooperative games 3: Price of anarchy Teacher: Ariel Procaccia Back to prison

CMU 15-896 Noncooperative games 3: Price of anarchy Teacher: Ariel Procaccia Back to prison The only Nash equilibrium in Prisoners Cooperate Defect dilemma is bad; but how bad is it? -1,-1 -9,0 Cooperate Objective function:

479 views • 24 slides

Power and Energy Monitoring MIB draft-ietf-eman-energy-monitoring-mib-07 Mouli Chandramouli B.

Power and Energy Monitoring MIB draft-ietf-eman-energy-monitoring-mib-07 Mouli Chandramouli B. Schoening Juergen Quittek Thomas Dietz Benoit Claise 88 IETF Meeting, Vancover 2013 Summary of Major Changes in 07

53 views • 3 slides

1 How do we establish fundamental principles Formal methods reducing complexity of security?

Charge Topics What are the most important ideas from other fields that we should try to integrate into cyber security? The Science of Security The Science of Security What steps are needed to establish more useful Questions and Promising

253 views • 3 slides

Game and Learn: An Introduction to Educational Gaming 1. What Is A Game? Ruben R. Puentedura,

Game and Learn: An Introduction to Educational Gaming 1. What Is A Game? Ruben R. Puentedura, Ph.D Some Definitions Formal Definition of Play (Salen & Zimmerman) Play is free movement within a more rigid structure. Vygotsky on

462 views • 11 slides

Multiplayer Online Games Insecurity [Re]Vuln Luigi Auriemma & Donato Ferrante Who? Donato

Multiplayer Online Games Insecurity [Re]Vuln Luigi Auriemma & Donato Ferrante Who? Donato Ferrante Luigi Auriemma @dntbug @luigi_auriemma ReVuln Ltd. revuln.com twitter.com/revuln info@revuln.com 2 Agenda Introduction

1.22k views • 68 slides

Video Games The Most Complicated Waste of Time Ever! Crystal Shaw Jacob King Kyle Hwang

Video Games The Most Complicated Waste of Time Ever! Crystal Shaw Jacob King Kyle Hwang What are Video Games ? Video Games are electronic games in which a user interacts with the game using an input device and receives

685 views • 29 slides

Game Bot Identification Game Bot Identification based on Manifold Learning based on Manifold

Game Bot Identification Game Bot Identification based on Manifold Learning based on Manifold Learning Kuan Ta Chen Academia Sinica Hsing Kuo Pao NTUST Hong Chung Chang NTUST ACM NetGames 2008 Game Bots Game bots: automated AI programs

371 views • 36 slides