empirical evidence equilibria in stochastic games
play

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout - PowerPoint PPT Presentation

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Outline 2 Stochastic games Empirical-evidence equilibria (EEEs) Open questions in EEEs Stochastic Games 3 Game theory Markov decision processes Game Theory


  1. Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout

  2. Outline 2 • Stochastic games • Empirical-evidence equilibria (EEEs) • Open questions in EEEs

  3. Stochastic Games 3 • Game theory • Markov decision processes

  4. Game Theory Decision making 1 , 𝑏 2 ) 𝑣 2 (𝑏 ∗ 𝑏 2 ∈𝒝 2 𝑏 ∗ 2 ) 𝑣 1 (𝑏 1 , 𝑏 ∗ 𝑏 1 ∈𝒝 1 𝑏 ∗ ⎩ ⎪ ⎨ ⎪ ⎧ Nash Equilibrium Game theory 𝑣(𝑏) 𝑏∈𝒝 4 𝑣∶ 𝒝 → ℝ ⟹ 𝑏 ∗ ∈ arg max 𝑣 1 ∶ 𝒝 1 × 𝒝 2 → ℝ 𝑣 2 ∶ 𝒝 1 × 𝒝 2 → ℝ 1 ∈ arg max 2 ∈ arg max

  5. Example: Battle of the Sexes 0, 0 3 𝑃) F 1, 3 Nash equilibria O 0, 1 2, 2 F O 5 • (𝐺, 𝐺) • (𝑃, 𝑃) 4 𝐺 1 / 4 𝑃, 1 / 3 𝐺 2 / • ( 3 /

  6. Markov Decision Process (MDP) Stage cost 𝑣(𝑦, 𝑏) Strategy 𝜏 ∶ ℋ → 𝒝 ∞ ∑ 𝑢=0 𝜀 𝑢 𝑣(𝑦 𝑢 , 𝑏 𝑢 )] Bellman’s equation Dynamic programming use knowledge of 𝑔 Reinforcement learning learn 𝑔 from repeated interaction 6 Dynamic 𝑦 + ∼ 𝑔(𝑦, 𝑏) ⟺ 𝑦 𝑢+1 ∼ 𝑔(𝑦 𝑢 , 𝑏 𝑢 ) History ℎ 𝑢 = (𝑦 0 , 𝑦 1 , … , 𝑦 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Utility 𝑉(𝜏) = 𝔽 𝑔,𝜏 [ 𝑉 ∗ (𝑦) = max 𝑏∈𝒝 {𝑣(𝑦, 𝑏) + 𝜀𝔽 𝑔 [𝑉 ∗ (𝑦 + ) | 𝑦, 𝑏]}

  7. Markov Decision Process (MDP) Stage cost 𝑣(𝑦, 𝑏) Strategy 𝜏 ∶ ℋ → 𝒝 ∞ ∑ 𝑢=0 𝜀 𝑢 𝑣(𝑦 𝑢 , 𝑏 𝑢 )] Bellman’s equation Dynamic programming use knowledge of 𝑔 Reinforcement learning learn 𝑔 from repeated interaction 6 Dynamic 𝑦 + ∼ 𝑔(𝑦, 𝑏) ⟺ 𝑦 𝑢+1 ∼ 𝑔(𝑦 𝑢 , 𝑏 𝑢 ) History ℎ 𝑢 = (𝑦 0 , 𝑦 1 , … , 𝑦 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Utility 𝑉(𝜏) = 𝔽 𝑔,𝜏 [ 𝑉 ∗ (𝑦) = max 𝑏∈𝒝 {𝑣(𝑦, 𝑏) + 𝜀𝔽 𝑔 [𝑉 ∗ (𝑦 + ) | 𝑦, 𝑏]}

  8. Markov Decision Process (MDP) Stage cost 𝑣(𝑦, 𝑏) Strategy 𝜏 ∶ 𝒴 → 𝒝 ∞ ∑ 𝑢=0 𝜀 𝑢 𝑣(𝑦 𝑢 , 𝑏 𝑢 )] Bellman’s equation Dynamic programming use knowledge of 𝑔 Reinforcement learning learn 𝑔 from repeated interaction 6 Dynamic 𝑦 + ∼ 𝑔(𝑦, 𝑏) ⟺ 𝑦 𝑢+1 ∼ 𝑔(𝑦 𝑢 , 𝑏 𝑢 ) History ℎ 𝑢 = (𝑦 0 , 𝑦 1 , … , 𝑦 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Utility 𝑉(𝜏) = 𝔽 𝑔,𝜏 [ 𝑉 ∗ (𝑦) = max 𝑏∈𝒝 {𝑣(𝑦, 𝑏) + 𝜀𝔽 𝑔 [𝑉 ∗ (𝑦 + ) | 𝑦, 𝑏]}

  9. Imperfect Information (POMDP) Signal 𝑡 ∼ 𝜉(𝑥) Strategy 𝜏 ∶ ℋ → 𝒝 7 Dynamic 𝑥 + ∼ 𝑜(𝑥, 𝑏) History ℎ 𝑢 = (𝑡 0 , 𝑡 1 , … , 𝑡 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Belief ℙ 𝑜,𝜉,𝜏 [𝑥 | ℎ]

  10. Imperfect Information (POMDP) Signal 𝑡 ∼ 𝜉(𝑥) Strategy 𝜏 ∶ ℋ → 𝒝 7 Dynamic 𝑥 + ∼ 𝑜(𝑥, 𝑏) History ℎ 𝑢 = (𝑡 0 , 𝑡 1 , … , 𝑡 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Belief ℙ 𝑜,𝜉,𝜏 [𝑥 | ℎ]

  11. Imperfect Information (POMDP) Signal 𝑡 ∼ 𝜉(𝑥) 7 Dynamic 𝑥 + ∼ 𝑜(𝑥, 𝑏) History ℎ 𝑢 = (𝑡 0 , 𝑡 1 , … , 𝑡 𝑢 , 𝑏 0 , 𝑏 1 , … , 𝑏 𝑢 ) Strategy 𝜏 ∶ Δ(𝒳 ) → 𝒝 Belief ℙ 𝑜,𝜉,𝜏 [𝑥 | ℎ]

  12. Stochastic Games 0 , 𝑏 1 2 ) 1 , … , 𝑏 𝑢 0 , 𝑏 2 2 , 𝑏 2 1 , … , 𝑡 𝑢 0 , 𝑡 2 1 ) 1 , … , 𝑏 𝑢 ℎ 𝑢 1 , 𝑏 1 ℎ 𝑢 1 , … , 𝑡 𝑢 8 0 , 𝑡 1 Dynamic 𝑥 + ∼ 𝑜(𝑥, 𝑏 1 , 𝑏 2 ) 𝑡 1 ∼ 𝜉 1 (𝑥) Signals { 𝑡 2 ∼ 𝜉 2 (𝑥) 1 = (𝑡 1 Histories { 2 = (𝑡 2 𝜏 1 ∶ ℋ 1 → 𝒝 1 Strategies { 𝜏 2 ∶ ℋ 2 → 𝒝 2 ℙ 𝑜,𝜉 1 ,𝜏 1 ,𝜉 2 ,𝜏 2 [𝑥, ℎ 2 | ℎ 1 ] Beliefs { ℙ 𝑜,𝜉 1 ,𝜏 1 ,𝜉 2 ,𝜏 2 [𝑥, ℎ 1 | ℎ 2 ]

  13. Stochastic Games 0 , 𝑏 1 2 ) 1 , … , 𝑏 𝑢 0 , 𝑏 2 2 , 𝑏 2 1 , … , 𝑡 𝑢 0 , 𝑡 2 1 ) 1 , … , 𝑏 𝑢 ℎ 𝑢 1 , 𝑏 1 ℎ 𝑢 1 , … , 𝑡 𝑢 8 0 , 𝑡 1 Dynamic 𝑥 + ∼ 𝑜(𝑥, 𝑏 1 , 𝑏 2 ) 𝑡 1 ∼ 𝜉 1 (𝑥) Signals { 𝑡 2 ∼ 𝜉 2 (𝑥) 1 = (𝑡 1 Histories { 2 = (𝑡 2 𝜏 1 ∶ ℋ 1 → 𝒝 1 Strategies { 𝜏 2 ∶ ℋ 2 → 𝒝 2 ℙ 𝑜,𝜉 1 ,𝜏 1 ,𝜉 2 ,𝜏 2 [𝑥, ℎ 2 | ℎ 1 ] Beliefs { ℙ 𝑜,𝜉 1 ,𝜏 1 ,𝜉 2 ,𝜏 2 [𝑥, ℎ 1 | ℎ 2 ]

  14. Existing Approaches 9 • (Weakly) belief-free equilibrium • Mean-field equilibrium • Incomplete theories

  15. Empirical-evidence Equilibria 10

  16. Motivation Agent 1 Nature Agent 2 0. Pick arbitrary strategies 1. Formulate simple but consistent models 2. Design strategies optimal w.r.t. models, then, back to 1. Empirical-evidence equilibrium is a fixed point: 11 • Strategies optimal w.r.t. models • Models consistent with strategies

  17. Example: Asset Management Trading one asset on the stock market Model based on Model very different for each agent 12 • information published by the company • observed trading activity

  18. Multiple to Single Agent Agent 1 Nature Agent 2 13

  19. Multiple to Single Agent Agent 1 Nature Agent 2 Nature 1 13

  20. Single Agent Setup Agent Nature 14

  21. Single Agent Setup Nature 14 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡)

  22. Single Agent Setup Nature 𝑡 14 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡)

  23. Single Agent Setup 𝑡 ∼ 𝜉(𝑥) 𝑡 14 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏) 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡)

  24. Example: Asset Management 𝑡 ∼ 𝜉(𝑥) 𝑡 Stage cost 𝑞 ⋅ 𝑏 Nature 𝑥 represents market sentiment, political climate, other traders 15 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏) 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) State holding 𝑦 ∈ {0 .. 𝑁} Action sell one, hold, or buy one 𝑏 ∈ {−1, 0, 1} Signal price 𝑞 ∈ { Low , High }

  25. Single Agent Setup 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝜏 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

  26. Single Agent Setup 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝜏 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

  27. Single Agent Setup 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝜏 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

  28. Single Agent Setup 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝑡 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

  29. Single Agent Setup 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝜏 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

  30. Single Agent Setup 𝑡) 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝜏 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, ̂ 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

  31. Single Agent Setup 𝑡) 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) 𝑡 Model ̂ 𝑡 ̂ 𝑡 consistent with 𝜏 𝜏 optimal w.r.t. ̂ 𝑡 16 𝑦 + ∼ 𝑔(𝑦, 𝑏, ̂ 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

  32. • 0 characteristic: ℙ[𝑡 = 0], ℙ[𝑡 = 1] • 1 characteristic: ℙ[𝑡𝑡 + = 00], ℙ[𝑡𝑡 + = 10], ℙ[𝑡𝑡 + = 01], ℙ[𝑡𝑡 + = 11] • ... • 𝑙 characteristic: probability of strings of length 𝑙 + 1 Definition Two processes 𝑡 and 𝑡 ′ are depth- 𝑙 consistent if Depth- 𝑙 Consistency Consider a binary stochastic process 𝑡 0100010001001010010110111010000111010101... they have the same 𝑙 characteristic 17

  33. Definition Two processes 𝑡 and 𝑡 ′ are depth- 𝑙 consistent if Depth- 𝑙 Consistency Consider a binary stochastic process 𝑡 0100010001001010010110111010000111010101... they have the same 𝑙 characteristic 17 • 0 characteristic: ℙ[𝑡 = 0], ℙ[𝑡 = 1] • 1 characteristic: ℙ[𝑡𝑡 + = 00], ℙ[𝑡𝑡 + = 10], ℙ[𝑡𝑡 + = 01], ℙ[𝑡𝑡 + = 11] • ... • 𝑙 characteristic: probability of strings of length 𝑙 + 1

  34. Depth- 𝑙 Consistency Consider a binary stochastic process 𝑡 0100010001001010010110111010000111010101... they have the same 𝑙 characteristic 17 • 0 characteristic: ℙ[𝑡 = 0], ℙ[𝑡 = 1] • 1 characteristic: ℙ[𝑡𝑡 + = 00], ℙ[𝑡𝑡 + = 10], ℙ[𝑡𝑡 + = 01], ℙ[𝑡𝑡 + = 11] • ... • 𝑙 characteristic: probability of strings of length 𝑙 + 1 Definition Two processes 𝑡 and 𝑡 ′ are depth- 𝑙 consistent if

  35. Depth- 𝑙 Consistency: Example 1 0.7 0.3 0.7 0.3 1 1 0 0 𝑨 1 𝑨 0 0.5 0.5 𝑨 ∅ 1 18

  36. 𝜈(𝑨 = (𝑡 1 , 𝑡 2 , … , 𝑡 𝑙 ))[𝑡 𝑙+1 ] = ℙ 𝜏 [𝑡 𝑢+1 = 𝑡 𝑙+1 | 𝑡 𝑢 = 𝑡 𝑙 , … , 𝑡 𝑢−𝑙+1 = 𝑡 1 ] Complete picture Fix a depth 𝑙 ∈ ℕ 𝑏 ∼ 𝜏(ℎ) 𝑡 ∼ 𝜉(𝑥) Model 𝑡 ̂ 𝑡 𝜏 ↦ 𝜈 consistent with 𝜏 𝜈 ↦ 𝜏 optimal w.r.t. 𝜈 𝑨 contains the last 𝑙 observed signals 19 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑡) 𝑥 + ∼ 𝑜(𝑥, 𝑦, 𝑏)

Recommend


More recommend