Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Georgia Institute of Technology
Empirical-evidence Equilibria (EEEs) Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach: 0. Pick arbitrary strategies 1. Formulate simple but consistent models 2. Design strategies optimal w.r.t. models, then, back to 1. The fixed points are EEEs Example Asset management on the stock market 2
Empirical-evidence Equilibria (EEEs) Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach: 0. Pick arbitrary strategies 1. Formulate simple but consistent models 2. Design strategies optimal w.r.t. models, then, back to 1. The fixed points are EEEs Example Asset management on the stock market 2
Empirical-evidence Equilibria (EEEs) Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach: 0. Pick arbitrary strategies 1. Formulate simple but consistent models 2. Design strategies optimal w.r.t. models, then, back to 1. The fixed points are EEEs Example Asset management on the stock market 2
Empirical-evidence Equilibria (EEEs) Agent 1 Nature Agent 2 Nature 1 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach: 0. Pick arbitrary strategies 1. Formulate simple but consistent models 2. Design strategies optimal w.r.t. models, then, back to 1. The fixed points are EEEs Example Asset management on the stock market 2
β’ π consistent with π β’ π optimal w.r.t. π Single-agent Setup π½ π [ π π’ π£(π¦ π’ , π π’ , π‘ π’ )] π’=0 β β π Agent max π‘ Μ Μ Nature 3 π¨ + βΌ π(π¨) π‘ + βΌ π(π¨)
β’ π consistent with π β’ π optimal w.r.t. π Single-agent Setup π π π’ π£(π¦ π’ , π π’ , π‘ π’ )] π’=0 β β π½ π [ max π‘ Μ Μ π‘ Nature π βΌ π(β) 3 π¦ + βΌ π(π¦, π, π‘) π¨ + βΌ π(π¨) π‘ + βΌ π(π¨)
β’ π consistent with π β’ π optimal w.r.t. π Single-agent Setup max π π’ π£(π¦ π’ , π π’ , π‘ π’ )] π’=0 β β π½ π [ π π‘ Μ Μ π‘ π‘ βΌ π(π₯) π βΌ π(β) 3 π¦ + βΌ π(π¦, π, π‘) π₯ + βΌ π(π₯, π¦, π) π¨ + βΌ π(π¨) π‘ + βΌ π(π¨)
β’ π consistent with π β’ π optimal w.r.t. π Single-agent Setup π π π’ π£(π¦ π’ , π π’ , π‘ π’ )] π’=0 β β π½ π [ max π‘ Μ Model π‘ π‘ βΌ π(π₯) π βΌ π(β) 3 π¦ + βΌ π(π¦, π, π‘) π₯ + βΌ π(π₯, π¦, π)
β’ π consistent with π β’ π optimal w.r.t. π Single-agent Setup max π π’ π£(π¦ π’ , π π’ , π‘ π’ )] π’=0 β β π½ π [ π π‘ Μ Μ π‘ π‘ βΌ π(π₯) π βΌ π(π¦, π¨) 3 π¦ + βΌ π(π¦, π, π‘) π₯ + βΌ π(π₯, π¦, π) π¨ + βΌ π(π¨) π‘ + βΌ π(π¨)
Single-agent Setup max π π’ π£(π¦ π’ , π π’ , π‘ π’ )] π’=0 β β π½ π [ π π‘ Μ Μ π‘ π‘ βΌ π(π₯) π βΌ π(π¦, π¨) 3 π¦ + βΌ π(π¦, π, π‘) π₯ + βΌ π(π₯, π¦, π) π¨ + βΌ π(π¨) π‘ + βΌ π(π¨) β’ π consistent with π β’ π optimal w.r.t. π
Depth- π Consistency Binary stochastic process π‘ 0100010001001010010110111010000111010101... Definition Two processes π‘ and Μ π‘ are depth- π consistent if they have the same π characteristic 4 β’ 0 characteristic: β[π‘ = 0], β[π‘ = 1] β’ 1 characteristic: β[π‘π‘ + = 00], β[π‘π‘ + = 10], β[π‘π‘ + = 01], β[π‘π‘ + = 11] β’ ... β’ π characteristic: probability of strings of length π + 1
Complete Picture Μ observed signals π¨ contains the last π Fix a depth π β β π‘ π‘ βΌ π(π¨) Μ π‘ π‘ βΌ π(π₯) π βΌ π(π¦, π¨) 5 π¦ + βΌ π(π¦, π, π‘) π₯ + βΌ π(π₯, π¦, π) π¨ + βΌ π π (π¨) π(π¨ = (π‘ 1 , π‘ 2 , β¦ , π‘ π ))[π‘ π+1 ] = β π [π‘ π’+1 = π‘ π+1 | π‘ π’ = π‘ π , β¦ , π‘ π’βπ+1 = π‘ 1 ]
(π, π) is an π empirical-evidence optimum ( π EEO) for π iff β’ π is π optimal w.r.t. π β’ π is depth- π consistent with π Empirical-evidence Optimality Definition Definition 6 (π, π) is an empirical-evidence optimum (EEO) for π iff β’ π is optimal w.r.t. π β’ π is depth- π consistent with π
Empirical-evidence Optimality Definition Definition 6 (π, π) is an empirical-evidence optimum (EEO) for π iff β’ π is optimal w.r.t. π β’ π is depth- π consistent with π (π, π) is an π empirical-evidence optimum ( π EEO) for π iff β’ π is π optimal w.r.t. π β’ π is depth- π consistent with π
β’ Technical assumption insures ergodicity of π‘ β’ π βΆ π β’ π βΆ π΄ Γ πΆ β Ξ(π) is parametrized over a simplex β’ Apply Brouwerβs fixed point theorem to π Existence Result Theorem For all π and π , there exists an π EEO for π Proof sketch ππππ‘ππ‘π’ππππ§ ββββββββ π π πππ’ππππππ’π§ βββββββββ π is continuous 7
Existence Result Theorem For all π and π , there exists an π EEO for π Proof sketch ππππ‘ππ‘π’ππππ§ ββββββββ π π πππ’ππππππ’π§ βββββββββ π is continuous 7 β’ Technical assumption insures ergodicity of π‘ β’ π βΆ π β’ π βΆ π΄ Γ πΆ β Ξ(π) is parametrized over a simplex β’ Apply Brouwerβs fixed point theorem to π
Multiagent Setup π¦ + π‘ 2 Μ Μ π¨ + π‘ 2 π‘ 1 Μ Μ π¨ + π‘ 1 π¦ + 8 π₯ + βΌ π(π₯, π¦ 1 , π 1 , π¦ 2 , π 2 ) (π‘ 1 , π‘ 2 ) βΌ π(π₯) 1 βΌ π 1 (π¦ 1 , π 1 , π‘ 1 ) 2 βΌ π 2 (π¦ 2 , π 2 , π‘ 2 ) π 1 βΌ π 1 (π¦ 1 , π¨ 1 ) π 2 βΌ π 2 (π¦ 2 , π¨ 2 ) 1 βΌ π 1,π 1 (π¨ 1 ) 2 βΌ π 2,π 2 (π¨ 2 ) π‘ 1 βΌ π 1 (π¨ 1 ) π‘ 2 βΌ π 2 (π¨ 2 )
Empirical-evidence Equilibrium Definition Theorem For all π and π , there exists an π EEE for π 9 Strategies π = (π 1 , π 2 , β¦ , π π ) Models π = (π 1 , π 2 , β¦ , π π ) Depths π = (π 1 , π 2 , β¦ , π π ) (π, π) is an empirical-evidence equilibrium (EEE) for π iff β’ for all π , π π is optimal w.r.t. π π β’ for all π , π π is depth- π π consistent with π
Learning Setup 1. Design strategies π optimal w.r.t. models π π ) = (1 β π½)π π’ π π π’+1 2. Formulate consistent models π upd , then, back to 1. 0. Pick arbitrary depth- 0 models π Stage cost π β π π Dynamic π¦ + 10 State holdings π¦ π β {0 .. π} Action sell one, hold, or buy one π π β {β1, 0, 1} Signal price π β { Low , High } π = π¦ π + π π Nature market trend π β { Bull , Bear } π₯ = (π, π) π + π½(π π’ π, upd β π π’
Learning Results: Offline 0.6 π = 2 π = 1 π [High] Prediction π π’ Time π’ 1 0.8 0.4 0 0.2 0 100 80 60 40 20 11
Learning Results: Online 0.6 π = 2 π = 1 π [High] Prediction π π’ Time π’ 1 0.8 0.4 0 0.2 0 100 80 60 40 20 11
β’ Endogenous model ( π¨ + βΌ π(π¨, π¦, π) ) β’ Quality of EEEs β’ Learning EEEs Concluding Remarks Comparison with mean-field equilibria Future directions 12 β’ Identical agents with a specific signal β’ Depth- 0 model β’ Large number of agents to recover Nash equilibrium
Concluding Remarks Comparison with mean-field equilibria Future directions 12 β’ Identical agents with a specific signal β’ Depth- 0 model β’ Large number of agents to recover Nash equilibrium β’ Endogenous model ( π¨ + βΌ π(π¨, π¦, π) ) β’ Quality of EEEs β’ Learning EEEs
Recommend
More recommend