Feedback Control for Learning in Games Gurdal ARSLAN & Jeff - PowerPoint PPT Presentation

Feedback Control for Learning in Games Gurdal ARSLAN & Jeff SHAMMA Mechanical and Aerospace Engineering UCLA

Setup: Repeated Games • Time k = 1,2,3,… • Player i : – Strategy: p i ( k ) ∈ ∆ – Action: a i ( k ) = rand[ p i ( k )] – Payoff: U i ( a i , a -i ) ' a i T M i a -i – Play: p i ( k ) = F (information up to time k ) • Assume players do not share utilities! How can simple rules lead players to mixed strategy Nash equilibrium? • Separate issues: Will they? should they? compute NE? 2

Prior Work & Convergence • (Stochastic) Fictitious Play • No Regret • New approaches: Multirate, Joint weak calibration, Regret testing, … • Convergence results: – Special cases: NE – Correlated equilibria – Convex hull of NE – “Dwell” near NE 3

Non-convergence Results • Shapley game vs Fictitious Play • Crawford (1985): wide class of learning mechanisms must fail to converge mixed strategies • Jordan anticoordination game: 3 players, each with 2 moves. P1 P2 P3 • Hart & Mas-Colell (2003): Consider larger class & show Uncoupled + Jordan anticoordination = non-convergence 4

Preview • Introduce new uncoupled dynamics based on “feedback control”. • Demonstrate how convergence to mixed strategy NE can be enabled ( including Shapley & Jordan games ). • Best/Better response variants. • Action/Payoff based versions. • Two/Multi-player cases. 5

Feedback Control disturbance controller process error actual desired + K P behavior behavior – feedback • K = controller = sequential decision maker • P = process with approximate model P model • Think of “standing upright” 6

What’s the Connection? • FB → GT: – New initiatives in “cooperative control” (combat systems, networks, self- assembly, automata teams…) require general sum formulation. • GT → FB: DM2 DM2 DM1 DM1 DM4 DM4 DM3 DM3 DM5 DM5 DM i is in feedback with DM -i 7

Typical Controller: PID • Proportional + Integral + Derivative – K P ⇒ current error – K I ⇒ error history – K D ⇒ error change • “Workhorse” of traditional control design. • Model of human motion control, homeostasis, … 8

Derivative Action e t+ τ t (now) • React to predicted error • Example: “Balancing”: 9

Repeated Games in Continuous Time • Empirical frequencies: • ODE method of stochastic approximation: Deterministic continuous time analysis ⇓ Probabilistic discrete time conclusions 10

Derivative Action FP (DAFP) • Define smoothed best response: • FP: • Derivative action FP: • “First order” model of adversary: Moving target. 11

Ideal vs Approximate • Ideal ⇒ Implicit Equations • Approximate: • Use of ideal differentiators can always lead to NE (a misleading conclusion). 12

Approximate Differentiator • Define: • Asymptotically • Two-player implementation: 13

Local Convergence of DAFP • Theorem : Consider a two-player game with a NE . 1) stable at stable at 2) unstable at , with stable at where are the eigenvalues of linearized 14

Jordan Anticoordination Revisited • Unique mixed NE is unstable under • , hence stabilizable by 15

Extensions to “Gradient Play” • “Better Response” = GP • DAGP : • Theorem : Similar … using eigenvalues of • Shapley & Jordan games convergent. 16

Crawford & Conlisk • Crawford (1985): Nonconvergence of a class of algorithms. • Conlisk (1993): “Adaptation in games: Two solutions to the Crawford puzzle”, J. of Economic Behavior and Organization. – Two-player zero-sum games – Play in “rounds” ( …, R-1, R, R+1, … ) – On R+1 use adjust mixed strategy with “forecast” payoff based on intervals R & R-1 17

Discrete Time • Theorem : Local attractor in continuous time ⇒ Positive probability of convergence to NE in discrete-time. • …as opposed to Zero probability. 18

Payoff Based Rules • Use “stimulus response” • Theorem : Positive probability of convergence to NE. 19

Jordan Anticoordination: Payoff Based DAGP γ = 1, λ = 50, ε = 0.1 20

Multiplayer Games • Immediate extensions in case of “pair-wise utility” structure: • Otherwise, must inspect “joint-action” version of FP. 21

Concluding Remarks • Feedback control motivates the use of auxiliary dynamics to enable NE convergence. • Other “controller” structures possible (all mixed strategy equilibria “stabilizable”) • DAFP & DAGP respect “graph” structures. • Key concerns: – Natural? – Strategic? 22

Feedback Control for Learning in Games Gurdal ARSLAN & Jeff - PowerPoint PPT Presentation

Feedback Control for Learning in Games Gurdal ARSLAN & Jeff SHAMMA Mechanical and Aerospace Engineering UCLA Setup: Repeated Games Time k = 1,2,3, Player i : Strategy: p i ( k ) Action: a i ( k ) = rand[ p i ( k

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Feedback Control Theory a Computer System s Perspective Introduction Introduction

LOGIC OF GAMES Andreas Blass University of Michigan Ann Arbor, MI 48109 ablass@umich.edu Games

Nash Dynamics and Potential Games Maria Serna Fall 2016 AGT-MIRI, FIB Potential Games Contents

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg

Congestion Games with affine functions Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

Nonlinear Control Lecture # 10 State Feedback Stabilization and Robust State Feedback

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Games with Sequential Actions: (Finite) Extensive- Form Games Xinshuo Weng Outline What are

Digital Games An Introduction What are Digital Games? Commonly referred to as video games

Tom Nichols VP PC Games, North America Aeria Games & Entertainment Agenda Aeria Games?

Baby Got Feedback: How to Give and Take Feedback Like A Boss Sarah Hagan @thesarahhagan Sarah

Feedback Control Theory Introduction - A Tutorial from Computer Systems Perspective What

Advising the C-Suite and Boards of Directors on Cybersecurity February 11, 2015 Agenda

of Contract and Shareholder Oppression Identifying Causes of Action; Pursuing Emergency

Japan Investor Presentation February 2016 Cautionary Statements And Risk Factors That May Affect

New Financial Reporting Format Investment Community Presentation 22 April 2016 John Whelen,

Research and Adaptive Management Integral to all components Recognition of uncertainty

tobacco with rural Aboriginal communities Assoc Prof Janelle Stirling History of Aboriginal

Descriptive Statistics 17.871 Spring 2015 Reasons for paying attention to data description

Report on Cohort 9 of the North Carolina 21 st Century Community Learning Center Program Prepared

Feedback Control for Learning in Games Gurdal ARSLAN & Jeff - PowerPoint PPT Presentation

Feedback Control for Learning in Games Gurdal ARSLAN & Jeff SHAMMA Mechanical and Aerospace Engineering UCLA Setup: Repeated Games Time k = 1,2,3, Player i : Strategy: p i ( k ) Action: a i ( k ) = rand[ p i ( k

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Feedback Control Theory a Computer System s Perspective Introduction Introduction

LOGIC OF GAMES Andreas Blass University of Michigan Ann Arbor, MI 48109 ablass@umich.edu Games

Nash Dynamics and Potential Games Maria Serna Fall 2016 AGT-MIRI, FIB Potential Games Contents

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg

Congestion Games with affine functions Maria Serna Fall 2016 AGT-MIRI, FIB-UPC Congestion Games

Nonlinear Control Lecture # 10 State Feedback Stabilization and Robust State Feedback

High Warehouse Racks: Optimal Feedback Control and High Warehouse Racks: Optimal Feedback Control

Games with Sequential Actions: (Finite) Extensive- Form Games Xinshuo Weng Outline What are

Digital Games An Introduction What are Digital Games? Commonly referred to as video games

Tom Nichols VP PC Games, North America Aeria Games &amp; Entertainment Agenda Aeria Games?

Baby Got Feedback: How to Give and Take Feedback Like A Boss Sarah Hagan @thesarahhagan Sarah

Feedback Control Theory Introduction - A Tutorial from Computer Systems Perspective What

Advising the C-Suite and Boards of Directors on Cybersecurity February 11, 2015 Agenda

of Contract and Shareholder Oppression Identifying Causes of Action; Pursuing Emergency

Japan Investor Presentation February 2016 Cautionary Statements And Risk Factors That May Affect

New Financial Reporting Format Investment Community Presentation 22 April 2016 John Whelen,

Research and Adaptive Management Integral to all components Recognition of uncertainty

tobacco with rural Aboriginal communities Assoc Prof Janelle Stirling History of Aboriginal

Descriptive Statistics 17.871 Spring 2015 Reasons for paying attention to data description

Report on Cohort 9 of the North Carolina 21 st Century Community Learning Center Program Prepared

Tom Nichols VP PC Games, North America Aeria Games & Entertainment Agenda Aeria Games?