Multigrid methods for zero-sum two player stochastic games with mean - PowerPoint PPT Presentation

Multigrid methods for zero-sum two player stochastic games with mean reward Sylvie Detournay and Marianne Akian INRIA Saclay and CMAP, ´ Ecole Polytechnique (France) 15th Copper Mountain Conference on Multigrid Methods 27 March - 1 April, 2011 Sylvie Detournay (INRIA and CMAP) MG for zero-sum stochastic games Copper 2011 1 / 22

DP for zero-sum stochastic games with mean reward Dynamic programming equation of zero-sum two-player stochastic games with mean reward � ρ + v ( x ) = max min P ( y | x , α, β ) v ( y ) + r ( x , α, β ) α ∈A ( x ) β ∈B ( x ,α ) y ∈ X ∀ x ∈ X (DP) X state space ρ is the mean reward of the game = non linear eigenvalue v ( x ) is the bias or relative value of the game starting at x ∈ X α, β action of the 1st, 2nd player MAX, MIN r ( x , α, β ) reward paid by MIN to MAX P ( y | x , α, β ) transition probability from x to y given the actions α, β Sylvie Detournay (INRIA and CMAP) MG for zero-sum stochastic games Copper 2011 2 / 22

DP for zero-sum stochastic games with mean reward Value of the game with mean reward starting at x ∈ X � N � 1 � ρ ( x ) = sup ( β k ) k ≥ 0 lim sup inf r ( x k , α k , β k ) N E N →∞ ( α k ) k ≥ 0 k =0 where � α k = α k ( X k , α k − 1 , β k − 1 , · · · ) β k = β k ( X k , α k , α k − 1 , β k − 1 , · · · ) are strategies and the state dynamics satisfies the process X k P ( X k +1 = y | X k = x , α k = α, β k = β ) = P ( y | x , α, β ) Sylvie Detournay (INRIA and CMAP) MG for zero-sum stochastic games Copper 2011 3 / 22

A deterministic zero-sum game Deterministic zero-sum two-player game The circles (resp. squares) represent the nodes at which Max (resp. Min) can play. 5 3 Values in the (DP) equation: −2 X = { Max nodes } 4’ 0 11 A ( x ) = { Min nodes accessible from x } −3 B ( x , α ) = { Max nodes accessible from 2 1’ α } −1 r ( x , α, β ) =weight( x , α )+weight( α, β ) 1 y = β 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) MG for zero-sum stochastic games Copper 2011 4 / 22

A deterministic zero-sum game 5 3 −2 4’ If Max initially moves to 2 ′ 0 11 −3 2 1’ −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) MG for zero-sum stochastic games Copper 2011 5 / 22

A deterministic zero-sum game 5 3 −2 4’ If Max initially moves to 2 ′ 0 11 −3 2 1’ he eventually looses 5 per turn. −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) MG for zero-sum stochastic games Copper 2011 5 / 22

A deterministic zero-sum game 5 3 But if Max initially moves to 1 ′ −2 4’ 0 11 −3 2 1’ −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) MG for zero-sum stochastic games Copper 2011 6 / 22

A deterministic zero-sum game 5 3 But if Max initially moves to 1 ′ −2 4’ 0 11 −3 he only looses eventually 2 1’ (1 + 0 + 2 + 3) / 2 = 3 per turn. −1 1 9 3’ 7 1 −5 0 6 2’ 2 Sylvie Detournay (INRIA and CMAP) MG for zero-sum stochastic games Copper 2011 6 / 22

DP for zero-sum stochastic games Optimal strategies and dynamic programming � N � 1 � ρ ( x ) = sup inf lim sup r ( x k , α k , β k ) x ∈ X N E ( β k ) k ≥ 0 N →∞ ( α k ) k ≥ 0 k =0 β where α ( X k ), β k = ¯ α, ¯ α ( X k )), define the matrix P ¯ For α k = ¯ β ( X k , ¯ α, ¯ α ( x ) , ¯ P ¯ β := P ( y | x , ¯ β ( x , ¯ α ( x ))). xy β are irreducible for all ¯ α, ¯ α and ¯ If P ¯ β then ρ ( x ) ≡ ρ is the unique solution of � ρ + v ( x ) = max min P ( y | x , α, β ) v ( y ) + r ( x , α, β ) (DP) α ∈A ( x ) β ∈B ( x ,α ) y ∈ X α, ¯ x ∈ X and ¯ β given by (DP)eq are optimal feedback strategies for both players. Sylvie Detournay (INRIA and CMAP) MG for zero-sum stochastic games Copper 2011 7 / 22

DP for zero-sum stochastic games Dynamic programming equation of zero-sum two-player stochastic differential games Isaacs PDE (diffusion problems) ∂ 2 v − ρ + H ( x , ∂ v , ) = 0 , x ∈ X (I) ∂ x i ∂ x i ∂ x j where H ( x , p , K ) = max β ∈B ( x ,α ) [ p · f ( x , α, β ) min α ∈A ( x ) � +1 2 tr ( σ ( x , α, β ) σ T ( x , α, β ) K ) + r ( x , α, β ) Discretization with monotone schemes of (I) yields (DP) Sylvie Detournay (INRIA and CMAP) MG for zero-sum stochastic games Copper 2011 8 / 22

DP for zero-sum stochastic games Motivation Solve dynamic programming equations arising from the discretization of Isaacs equations for example, long term diffusion’s problems, risk sensitive problems (finance), singular perturbations of Isaacs eq . . . Solve large scale zero-sum stochastic games (with discrete state space) for example, problems arising from the web, problems in verification of programs in computer science, . . . Extend this equation for the general case, that is without irreducible assumption. → Use policy iteration algorithm combined with multigrids to solve the dynamic programming equation Sylvie Detournay (INRIA and CMAP) MG for zero-sum stochastic games Copper 2011 9 / 22

DP for zero-sum stochastic games Dynamic programming for multichain games In general, the value of the game is solution of the dynamic programming equation: ρ ( x ) ( t + 1) + v ( x ) = F ( ρ t + v ; x ) , x ∈ X , t large enough where F is the dynamic programming operator: � F ( v ; x ) := max min P ( y | x , α, β ) v ( y ) + r ( x , α, β ) . α ∈A ( x ) β ∈B ( x ,α ) y ∈ X ( { ρ t + v , t large } is an invariant half line) . Sylvie Detournay (INRIA and CMAP) MG for zero-sum stochastic games Copper 2011 10 / 22

DP for zero-sum stochastic games This is equivalent to solve the system for x ∈ X : � ρ ( x ) = max min P ( y | x , α, β ) ρ ( y ) α ∈A ( x ) β ∈B ( x ,α ) y ∈ X � ρ ( x ) + v ( x ) = max min P ( y | x , α, β ) v ( y ) + r ( x , α, β ) α ∈A ρ ( x ) β ∈B ρ ( x ,α ) y ∈ X � � with A ρ ( x ) := argmax α ∈A ( x ) min β ∈B ( x ,α ) � y ∈ X P ( y | x , α, β ) ρ ( y ) �� and B ρ ( x , α ) := argmin β ∈B ( x ,α ) y ∈ X P ( y | x , α, β ) ρ ( y ) For a one player game: � ρ ( x ) = min P ( y | x , β ) ρ ( y ) β ∈B ( x ) y ∈ X � ρ ( x ) + v ( x ) = min P ( y | x , β ) v ( y ) + r ( x , β ) β ∈B ρ ( x ) y ∈ X with B ρ ( x ) = argmin β ∈B ( x ) � y ∈ X P ( y | x , β ) ρ ( y ). Sylvie Detournay (INRIA and CMAP) MG for zero-sum stochastic games Copper 2011 11 / 22

Policy iteration (PI) algorithm Multichain Policy Iteration Algorithm for one player (Denardo, Fox, 67) Start with ¯ β 0 : x �→ ¯ β 0 ( x ) Calculate value and bias ( ρ k +1 , v k +1 ) for policy ¯ β k solution of 1 ρ k +1 = P ρ k +1 + v k +1 = P β k v k +1 + r ¯ ¯ ¯ β k ρ k +1 β k and Improve the policy: find ¯ β k +1 optimal for ( ρ k +1 , v k +1 ) 2 �� ¯ P ( y | x , β ) v k +1 ( y ) + r ( x , β ) β k +1 ( x ) ∈ argmin , x ∈ X β ∈B ρ k +1 ( x ) y ∈ X with B ρ ( x ) = argmin β ∈B ( x ) � y ∈ X P ( y | x , β ) ρ ( y ). Sylvie Detournay (INRIA and CMAP) MG for zero-sum stochastic games Copper 2011 12 / 22

Policy iteration (PI) algorithm Easy to show ρ k +1 ≤ ρ k � if ρ k +1 = ρ k → degenerate iteration v k +1 is defined up to Ker ( I − P ¯ β k ) with dim = nb of ergodic class of P ¯ β k ≥ 1. → PI may cycle when they are multiple ergodic classes To avoid this : Optimal strategies are improved in a conservative way (¯ β k +1 ( x ) = ¯ β k ( x ) if optimal) v k +1 is fixed on a point of each ergodic class of P ¯ β k ⇒ when ρ k +1 = ρ k , v k +1 ( x ) = v k ( x ) on each ergodic classes of P ¯ β k ⇒ ( ρ k , v k ) k ≥ 1 is non increasing in a lexicographical order ρ k +1 ≤ ρ k and if ρ k +1 = ρ k , v k +1 ≤ v k ⇒ PI stops after a finite time when sets of actions are finite Remark: PI ≈ Newton algorithm in the case with unique solution v . Sylvie Detournay (INRIA and CMAP) MG for zero-sum stochastic games Copper 2011 13 / 22

Multigrid methods for zero-sum two player stochastic games with mean - PowerPoint PPT Presentation

Multigrid methods for zero-sum two player stochastic games with mean reward Sylvie Detournay and Marianne Akian INRIA Saclay and CMAP, Ecole Polytechnique (France) 15th Copper Mountain Conference on Multigrid Methods 27 March - 1 April,

Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and

Compact Fourier Analysis for Multigrid Methods Cortona 2008 Thomas Huckle joint work with

CSC2556 Lecture 11 Noncooperative Games 2: Zero-Sum Games, Stackelberg Games CSC2556 - Nisarg

ARTigo Tag Cluster tags of player 2 player 4 player 1 player 3 1 russian 1 army 1

Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Zero-Sum Games

Non-Zero-Sum Stochastic Differential Games of Controls and Stoppings Qinghua Li October 1, 2009

CS 598 RM : Algorithmic game theory Lecture 1 Two-player games For any two-player game, we have

REVOLUTIONIZING LATTICE QCD PHYSICS WITH HETEROGENEOUS MULTIGRID Kate Clark, April 6th 2016

ex Addition: 1-bit half adder A + Sum B Carry out Carry A B Sum out 0 0 A 0 1 Sum

The Player Agent The Player Agent Are they the most important league official right now? right

2-Player Zero-Sum Stochastic Differential Games based on common work with Rainer Buckdahn

Outline CS 188: Artificial Intelligence Zero-sum deterministic two player games Spring 2011

Algebraic multigrid methods for mechanical engineering applications Mark F. Adams St.

Chapter 2.5 Intermission Zero-Sum Games Zero-Sum Games A game consists of Players: Can

CS 170 Section 9 Zero-Sum Games, Reductions Owen Jow | owenjow@berkeley.edu Zero-Sum Games

Two-Player Game State Machine 2-Player Game State Diagram 2PG2 2-Player

Recent Advances in Two-loop Superstrings Eric DHoker Institut des Hautes Etudes Scientifiques,

Why You Should Run TPC-DS: A Workload Analysis Meikel Poess Raghunath Othayoth Nambiar Dave

Examples of the VC Dimension prof. dr Arno Siebes Algorithmic Data Analysis Group Department of

One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon Zaoxing Liu, Antonis

Introduction to Computational Topology Ahmed Abdelkader Guest Lecture CMSC 754 Spring 2020

Question Answering on Web Data Silei Xu CS294S April 9, 2020 Joint work with Giovanni Campagna,

The Extended SPaRKy Restaurant Corpus designing a corpus with variable information density David

Inferring Restaurant Styles by Mining Crowd Sourced Photos from User-Review Websites Haofu Liao,