Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout

Outline 2 • Stochastic games • Empirical-evidence equilibria (EEEs) • Open questions in EEEs

Stochastic Games 3 • Game theory • Markov decision processes

Game Theory Decision making 1 , 𝑏 2 ) 𝑣 2 (𝑏 ∗ 𝑏 2 ∈𝒝 2 𝑏 ∗ 2 ) 𝑣 1 (𝑏 1 , 𝑏 ∗ 𝑏 1 ∈𝒝 1 𝑏 ∗ ⎩ ⎪ ⎨ ⎪ ⎧ Nash Equilibrium Game theory 𝑣(𝑏) 𝑏∈𝒝 4 𝑣∶ 𝒝 → ℝ ⟹ 𝑏 ∗ ∈ arg max 𝑣 1 ∶ 𝒝 1 × 𝒝 2 → ℝ 𝑣 2 ∶ 𝒝 1 × 𝒝 2 → ℝ 1 ∈ arg max 2 ∈ arg max

Example: Battle of the Sexes 0, 0 3 𝑃) F 1, 3 Nash equilibria O 0, 1 2, 2 F O 5 • (𝐺, 𝐺) • (𝑃, 𝑃) 4 𝐺 1 / 4 𝑃, 1 / 3 𝐺 2 / • ( 3 /

Markov Decision Process (MDP) Stage cost 𝑣(𝑦, 𝑏) Strategy 𝜏 ∶ ℋ → 𝒝 ∞ ∑ 𝑢=0 𝜀 𝑢 𝑣(𝑦 𝑢 , 𝑏 𝑢 )] Bellman’s equation Dynamic programming use knowledge of 𝑔 Reinforcement learning learn 𝑔 from repeated interaction 6 Dynamic 𝑦 + ∼ 𝑔(𝑦, 𝑏) ⟺ 𝑦 𝑢+1 ∼ 𝑔(𝑦 𝑢 , 𝑏 𝑢 ) History ℎ 𝑢 = (𝑦 0 , 𝑦 1 , … , 𝑦 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Utility 𝑉(𝜏) = 𝔽 𝑔,𝜏 [ 𝑉 ∗ (𝑦) = max 𝑏∈𝒝 {𝑣(𝑦, 𝑏) + 𝜀𝔽 𝑔 [𝑉 ∗ (𝑦 + ) | 𝑦, 𝑏]}

Markov Decision Process (MDP) Stage cost 𝑣(𝑦, 𝑏) Strategy 𝜏 ∶ 𝒴 → 𝒝 ∞ ∑ 𝑢=0 𝜀 𝑢 𝑣(𝑦 𝑢 , 𝑏 𝑢 )] Bellman’s equation Dynamic programming use knowledge of 𝑔 Reinforcement learning learn 𝑔 from repeated interaction 6 Dynamic 𝑦 + ∼ 𝑔(𝑦, 𝑏) ⟺ 𝑦 𝑢+1 ∼ 𝑔(𝑦 𝑢 , 𝑏 𝑢 ) History ℎ 𝑢 = (𝑦 0 , 𝑦 1 , … , 𝑦 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Utility 𝑉(𝜏) = 𝔽 𝑔,𝜏 [ 𝑉 ∗ (𝑦) = max 𝑏∈𝒝 {𝑣(𝑦, 𝑏) + 𝜀𝔽 𝑔 [𝑉 ∗ (𝑦 + ) | 𝑦, 𝑏]}

Imperfect Information (POMDP) Signal 𝑡 ∼ 𝜉(𝑥) Strategy 𝜏 ∶ ℋ → 𝒝 7 Dynamic 𝑥 + ∼ 𝑜(𝑥, 𝑏) History ℎ 𝑢 = (𝑡 0 , 𝑡 1 , … , 𝑡 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Belief ℙ 𝑜,𝜉,𝜏 [𝑥 | ℎ]

Imperfect Information (POMDP) Signal 𝑡 ∼ 𝜉(𝑥) 7 Dynamic 𝑥 + ∼ 𝑜(𝑥, 𝑏) History ℎ 𝑢 = (𝑡 0 , 𝑡 1 , … , 𝑡 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Strategy 𝜏 ∶ Δ(𝒳 ) → 𝒝 Belief ℙ 𝑜,𝜉,𝜏 [𝑥 | ℎ]

Stochastic Games 0 , 𝑏 1 2 ) 1 , … , 𝑏 𝑢 0 , 𝑏 2 2 , 𝑏 2 1 , … , 𝑡 𝑢 0 , 𝑡 2 1 ) 1 , … , 𝑏 𝑢 ℎ 𝑢 1 , 𝑏 1 ℎ 𝑢 1 , … , 𝑡 𝑢 8 0 , 𝑡 1 Dynamic 𝑥 + ∼ 𝑜(𝑥, 𝑏 1 , 𝑏 2 ) 𝑡 1 ∼ 𝜉 1 (𝑥) Signals { 𝑡 2 ∼ 𝜉 2 (𝑥) 1 = (𝑡 1 Histories { 2 = (𝑡 2 𝜏 1 ∶ ℋ 1 → 𝒝 1 Strategies { 𝜏 2 ∶ ℋ 2 → 𝒝 2 ℙ 𝑜,𝜉 1 ,𝜏 1 ,𝜉 2 ,𝜏 2 [𝑥, ℎ 2 | ℎ 1 ] Beliefs { ℙ 𝑜,𝜉 1 ,𝜏 1 ,𝜉 2 ,𝜏 2 [𝑥, ℎ 1 | ℎ 2 ]

Existing Approaches 9 • (Weakly) belief-free equilibrium • Mean-field equilibrium • Incomplete theories

Empirical-evidence Equilibria 10

Motivation Agent 1 Nature Agent 2 0. Pick arbitrary strategies 1. Formulate simple but consistent models 2. Design strategies optimal w.r.t. models, then, back to 1. Empirical-evidence equilibrium is a fixed point: 11 • Strategies optimal w.r.t. models • Models consistent with strategies

Example: Asset Management Trading one asset on the stock market Model based on Model very different for each agent 12 • information published by the company • observed trading activity

Multiple to Single Agent Agent 1 Nature Agent 2 13

Multiple to Single Agent Agent 1 Nature Agent 2 Nature 1 13

Single Agent Setup Agent Nature 14

Single Agent Setup Nature 14 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡)

Single Agent Setup Nature 𝑡 14 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡)

Single Agent Setup 𝑡 ∼ 𝜉(𝑥) 𝑡 14 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏) 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡)

Example: Asset Management 𝑡 ∼ 𝜉(𝑥) 𝑡 Stage cost 𝑞 ⋅ 𝑏 Nature 𝑥 represents market sentiment, political climate, other traders 15 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏) 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) State holding 𝑦 ∈ {0 .. 𝑁} Action sell one, hold, or buy one 𝑏 ∈ {−1, 0, 1} Signal price 𝑞 ∈ { Low , High }

Single Agent Setup 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝜏 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

Single Agent Setup 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝑡 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

Single Agent Setup 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝜏 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

Single Agent Setup 𝑡) 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝜏 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, ̂ 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

• 0 characteristic: ℙ[𝑡 = 0], ℙ[𝑡 = 1] • 1 characteristic: ℙ[𝑡𝑡 + = 00], ℙ[𝑡𝑡 + = 10], ℙ[𝑡𝑡 + = 01], ℙ[𝑡𝑡 + = 11] • ... • 𝑙 characteristic: probability of strings of length 𝑙 + 1 Definition Two processes 𝑡 and 𝑡 ′ are depth- 𝑙 consistent if Depth- 𝑙 Consistency Consider a binary stochastic process 𝑡 0100010001001010010110111010000111010101... they have the same 𝑙 characteristic 17

Definition Two processes 𝑡 and 𝑡 ′ are depth- 𝑙 consistent if Depth- 𝑙 Consistency Consider a binary stochastic process 𝑡 0100010001001010010110111010000111010101... they have the same 𝑙 characteristic 17 • 0 characteristic: ℙ[𝑡 = 0], ℙ[𝑡 = 1] • 1 characteristic: ℙ[𝑡𝑡 + = 00], ℙ[𝑡𝑡 + = 10], ℙ[𝑡𝑡 + = 01], ℙ[𝑡𝑡 + = 11] • ... • 𝑙 characteristic: probability of strings of length 𝑙 + 1

Depth- 𝑙 Consistency Consider a binary stochastic process 𝑡 0100010001001010010110111010000111010101... they have the same 𝑙 characteristic 17 • 0 characteristic: ℙ[𝑡 = 0], ℙ[𝑡 = 1] • 1 characteristic: ℙ[𝑡𝑡 + = 00], ℙ[𝑡𝑡 + = 10], ℙ[𝑡𝑡 + = 01], ℙ[𝑡𝑡 + = 11] • ... • 𝑙 characteristic: probability of strings of length 𝑙 + 1 Definition Two processes 𝑡 and 𝑡 ′ are depth- 𝑙 consistent if

Depth- 𝑙 Consistency: Example 1 0.7 0.3 0.7 0.3 1 1 0 0 𝑨 1 𝑨 0 0.5 0.5 𝑨 ∅ 1 18

𝜈(𝑨 = (𝑡 1 , 𝑡 2 , … , 𝑡 𝑙 ))[𝑡 𝑙+1 ] = ℙ 𝜏 [𝑡 𝑢+1 = 𝑡 𝑙+1 | 𝑡 𝑢 = 𝑡 𝑙 , … , 𝑡 𝑢−𝑙+1 = 𝑡 1 ] Complete picture Fix a depth 𝑙 ∈ ℕ 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) Model 𝑡 ̂ 𝑡 𝜏 ↦ 𝜈 consistent with 𝜏 𝜈 ↦ 𝜏 optimal w.r.t. 𝜈 𝑨 contains the last 𝑙 observed signals 19 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout - PowerPoint PPT Presentation

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Outline 2 Stochastic games Empirical-evidence equilibria (EEEs) Open questions in EEEs Stochastic Games 3 Game theory Markov decision processes Game Theory

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Georgia Institute of

Chemistry 2000 Slide Set 19b: Organic acids Acid dissociation equilibria Marc R. Roussel March

Sustainable Equilibria I Myerson (1996) argued informally for a new refinement concept that he

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

Equilibria in large one-arm bandit games A. Salomon Universit e Paris 13 HEC Paris November

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6th, 2006 CS286r

Stochastic Games Reachability objectives The value (in Formal Verification) Min strategies

Tighter Bounds on the Inefficiency Ratio of Stable Equilibria in Load Balancing Games Akaki

Efficiency of equilibria Non-atomic routing games Non-atomic routing games Definition:

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and

Strategy recovery for stochastic mean payoff games Marcello Mamino TU Dresden GRASTA 15,

CSE 190 Data Mining and Predictive Analytics Introduction What is CSE 190? In this course we

Bag - of -w ords SE N TIME N T AN ALYSIS IN P YTH ON Violeta Mishe v a Data Scientist What is a

Modeling and Representing Negation in Data-driven Machine Learning-based Sentiment Analysis

Definition Liu et al. (2009) define a sentiment or opinion as a quintuple ,

Phrase-Indexed Question Answering : A New Challenge for Scalable Document Comprehension Minjoon

Essentials in Scaling Your Company & Growing Your Customers Presented by: Mona Elesseily

Sometimes its tough to get everyone in your company on the same page when it come to content,

Content Strategy 101: Start Here Kris%na Halvorson CEO, Brain Traffic ContentStrategy.com What

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout - PowerPoint PPT Presentation

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Outline 2 Stochastic games Empirical-evidence equilibria (EEEs) Open questions in EEEs Stochastic Games 3 Game theory Markov decision processes Game Theory

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Georgia Institute of

Chemistry 2000 Slide Set 19b: Organic acids Acid dissociation equilibria Marc R. Roussel March

Sustainable Equilibria I Myerson (1996) argued informally for a new refinement concept that he

Games Miheer Dewaskar Chennai Mathematical Institute April 27, 2016 1 / 19 Outline Finite

Equilibria in large one-arm bandit games A. Salomon Universit e Paris 13 HEC Paris November

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI &amp; University of

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI &amp; University of

Nash Q-Learning for General-Sum Stochastic Games Hu &amp; Wellman March 6th, 2006 CS286r

Stochastic Games Reachability objectives The value (in Formal Verification) Min strategies

Tighter Bounds on the Inefficiency Ratio of Stable Equilibria in Load Balancing Games Akaki

Efficiency of equilibria Non-atomic routing games Non-atomic routing games Definition:

S S S S erious Games erious Games erious Games erious Games + Computer S + Computer S +

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Pre-Grundy Games Games And Graphs Workshop 2017 In collaboration with : Eric Duch ene,

Multigrid methods for two player zero-sum stochastic games Sylvie Detournay INRIA Saclay and

Strategy recovery for stochastic mean payoff games Marcello Mamino TU Dresden GRASTA 15,

CSE 190 Data Mining and Predictive Analytics Introduction What is CSE 190? In this course we

Bag - of -w ords SE N TIME N T AN ALYSIS IN P YTH ON Violeta Mishe v a Data Scientist What is a

Modeling and Representing Negation in Data-driven Machine Learning-based Sentiment Analysis

Definition Liu et al. (2009) define a sentiment or opinion as a quintuple ,

Phrase-Indexed Question Answering : A New Challenge for Scalable Document Comprehension Minjoon

Essentials in Scaling Your Company &amp; Growing Your Customers Presented by: Mona Elesseily

Sometimes its tough to get everyone in your company on the same page when it come to content,

Content Strategy 101: Start Here Kris%na Halvorson CEO, Brain Traffic ContentStrategy.com What

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Strategic Games: Social Optima and Nash Equilibria Krzysztof R. Apt CWI & University of

Nash Q-Learning for General-Sum Stochastic Games Hu & Wellman March 6th, 2006 CS286r

Essentials in Scaling Your Company & Growing Your Customers Presented by: Mona Elesseily