Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman - PowerPoint PPT Presentation

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6th, 2006 CS286r Presented by Ilan Lobel

Outline  Stochastic Games and Markov Perfect Equilibria  Bellman’s Operator as a Contraction Mapping  Stochastic Approximation of a Contraction Mapping  Application to Zero-Sum Markov Games  Minimax-Q Learning  Theory of Nash-Q Learning  Empirical Testing of Nash-Q Learning

How do we model games that evolve over time ?  Stochastic Games !  Current Game = State  Ingredients: – Agents (N) – States (S) – Payoffs (R) – Transition Probabilities (P) – Discount Factor ( δ )

Example of a Stochastic Game C D δ = 0.9 1,2 3,4 A Move with 50% probability when (A,C) or (A,D) 5,6 7,8 B C D E -1,2 -3,4 0,0 A -5,6 -7,8 -10,10 Move with 30% probability B when (B,D)

Markov Game is a Generalization of… Repeated Games Markov Games Add States

Markov Game is a Generalization of… Repeated Games MDP Markov Games Add Agents Add States

Markov Perfect Equilibrium (MPE)  Strategy maps states into randomized actions – π i: S Δ (A)  No agent has an incentive to unilaterally change her policy.

Cons & Pros of MPEs  Cons: – Can’t implement everything described by the Folk Theorems (i.e., no trigger strategies)  Pros: – MPEs always exist in finite Markov Games (Fink, 64) – Easier to “search for”

Learning in Stochastic Games  Learning is specially important in Markov Games because MPE are hard to compute.  Do we know: – Our own payoffs ? – Others’ rewards ? – Transition probabilities ? – Others’ strategies ?

Learning in Stochastic Games  Adapted from Reinforcement Learning: – Minimax-Q Learning (zero-sum games) – Nash-Q Learning – CE-Q Learning

Zero-Sum Stochastic Games  Nice properties: – All equilibria have the same value. – Any equilibrium strategy of player 1 against any equilibrium strategy of player 2 produces an MPE. – It has a Bellman’s-type equation.

Bellman’s Equation in DP  Bellman Operator: T  Bellman’s Equation Rewritten:

Contraction Mapping  The Bellman Operator has the contraction property:  Bellman’s Equation is a direct consequence of the contraction.

The Shapley Operator for Zero-Sum Stochastic Games  The Shapley Operator is a contraction mapping. (Shapley, 53)  Hence, it also has a fixed point, which is an MPE:

Value Iteration for Zero-Sum Stochastic Games  Direct consequence of contraction.  Converges to fixed point of operator.

Q-Learning  Another consequence of a contraction mapping: – Q-Learning converges !  Q-Learning can be described as an approximation of value iteration: – Value iteration with noise.

Q-Learning Convergence  Q-Learning is called a Stochastic Iterative Approximation of Bellman’s operator: – Learning Rate of 1/t. – Noise is zero-mean and has bounded variance.  It converges if all state-action pairs are visited infinitely often. (Neuro-Dynamic Programming – Bertsekas, Tsitsiklis)

Minimax-Q Learning Algorithm For Zero-Sum Stochastic Games  Initialize your Q0(s,a1,a2) for all states, actions.  Update rule:  Player 1 then chooses action u1 in the next stage sk+1.

Minimax-Q Learning  It’s a Stochastic Iterative Approximation of Shapley Operator.  It converges to a Nash Equilibrium if all state- action-action triplets are visited infinitely often. (Littman, 96)

Can we extend it to General-Sum Stochastic Games ?  Yes & No.  Nash-Q Learning is such an extension.  However, it has much worse computational and theoretical properties.

Nash-Q Learning Algorithm  Initialize Q0j(s,a1,a2) for all states, actions and for every agent. – You must simulate everyone’s Q-factors.  Update rule:  Choose the randomized action generated by the Nash operator.

The Nash Operator and The Principle of Optimality  Nash Operator finds the Nash of a stage game.  Find Nash of stage game with Q-factors as your payoffs. Payoffs for Rest of the Current Reward Markov Game

The Nash Operator  Unkown complexity even for 2 players.  In comparison, the minimax operator can be solved in polynomial time. (there’s a linear programming formulation)  For convergence, all players must break ties in favor of the same Nash Equilibrium.  Why not go model-based if computation is so expensive ?

Convergence Results  If every stage game encountered during learning has a global optimum, Nash-Q converges.  If every stage game encountered during learning has a saddle point, Nash-Q converges.  Both of these are VERY strong assumptions.

Convergence Result Analysis  The global optimum assumption implies full cooperation between agents.  The saddle point assumption implies no cooperation between agents.  Are these equivalent to DP Q-Learning and minimax-Q Learning, respectively ?

Empirical Testing: The Grid-world WORLD 1 Some Nash Equilibria

Empirical Testing: Nash Equilibria (3%) (3%) (97%) WORLD 2 All Nash Equilibria

Empirical Performance  In very small and simple games, Nash-Q learning often converged even though theory did not predict so.  In particular, if all Nash Equilibria have the same value Nash-Q did better than expected.

Conclusions  Nash-Q is a nice step forward: – It can be used for any Markov Game. – It uses the Principle of Optimality in a smart way.  But there is still a long way to go: – Convergence results are weak. – There are no computational complexity results.

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman - PowerPoint PPT Presentation

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6th, 2006 CS286r Presented by Ilan Lobel Outline Stochastic Games and Markov Perfect Equilibria Bellmans Operator as a Contraction Mapping Stochastic

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg

Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games H.L. Prasad

Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Zero-Sum Games

Nash Dynamics and Potential Games Maria Serna Fall 2016 AGT-MIRI, FIB Potential Games Contents

Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

ex Addition: 1-bit half adder A + Sum B Carry out Carry A B Sum out 0 0 A 0 1 Sum

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Outline 2 Stochastic

Stochastic Games Reachability objectives The value (in Formal Verification) Min strategies

Non-Zero-Sum Stochastic Differential Games of Controls and Stoppings Qinghua Li October 1, 2009

Chapter 2.5 Intermission Zero-Sum Games Zero-Sum Games A game consists of Players: Can

CS 170 Section 9 Zero-Sum Games, Reductions Owen Jow | owenjow@berkeley.edu Zero-Sum Games

Strategic games: Basic definitions and examples Maria Serna September 14th, 2016 AGT-MIRI

What is game theory? Study of interacting decision makers emphasis on cold-blooded,

2-Player Zero-Sum Stochastic Differential Games based on common work with Rainer Buckdahn

NEW RETAIL rethinking the physical shop DT17 - 2rd Sem - Oral Synopsis exam - Marketing &

Mind the gap Linking (telco) forecasting to innovation management Drs. Patrick A. van der Duin

Applications of Computer Science: Game Theory and Computational Biology Instructor: Nihshanka

MARKOV GAMES A framework for multi-agent reinforcement learning Shen (Sean) Chen Review on

Water and the Jordan River Co-riparians: From a Zero-Sum to a Positive-Sum Game David J.H.

September 2014 Joint Powers Authority (JPA) 15 members Pursue development of a

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman - PowerPoint PPT Presentation

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6th, 2006 CS286r Presented by Ilan Lobel Outline Stochastic Games and Markov Perfect Equilibria Bellmans Operator as a Contraction Mapping Stochastic

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg

Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games H.L. Prasad

Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Zero-Sum Games

Nash Dynamics and Potential Games Maria Serna Fall 2016 AGT-MIRI, FIB Potential Games Contents

Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI &amp; University of

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI &amp; University of

ex Addition: 1-bit half adder A + Sum B Carry out Carry A B Sum out 0 0 A 0 1 Sum

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Outline 2 Stochastic

Stochastic Games Reachability objectives The value (in Formal Verification) Min strategies

Non-Zero-Sum Stochastic Differential Games of Controls and Stoppings Qinghua Li October 1, 2009

Chapter 2.5 Intermission Zero-Sum Games Zero-Sum Games A game consists of Players: Can

CS 170 Section 9 Zero-Sum Games, Reductions Owen Jow | owenjow@berkeley.edu Zero-Sum Games

Strategic games: Basic definitions and examples Maria Serna September 14th, 2016 AGT-MIRI

What is game theory? Study of interacting decision makers emphasis on cold-blooded,

2-Player Zero-Sum Stochastic Differential Games based on common work with Rainer Buckdahn

NEW RETAIL rethinking the physical shop DT17 - 2rd Sem - Oral Synopsis exam - Marketing &amp;

Mind the gap Linking (telco) forecasting to innovation management Drs. Patrick A. van der Duin

Applications of Computer Science: Game Theory and Computational Biology Instructor: Nihshanka

MARKOV GAMES A framework for multi-agent reinforcement learning Shen (Sean) Chen Review on

Water and the Jordan River Co-riparians: From a Zero-Sum to a Positive-Sum Game David J.H.

September 2014 Joint Powers Authority (JPA) 15 members Pursue development of a

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

NEW RETAIL rethinking the physical shop DT17 - 2rd Sem - Oral Synopsis exam - Marketing &