Max-plus Stochastic Processes and Control W.H. Fleming, Brown University 1
1. Introduction, historical background 2. Max-plus expectations 3. Max-plus SDEs and large deviations 4. Max-plus martingales and differential rule 5. Dynamic programming PDEs and variational inequalities 6. Max-plus stochastic control I: terminal cost 7. Max-plus optimal control II: max-plus additive running cost 8. Merton optimal consumption problem 2
Historical Background a) Optimal deterministic control Pontryagin’s principle, Bellman’s dynamic pro- gramming principle (1950s) b) Two-player, zero-sum differential games Isaacs pursuit-evasion games (1950s) c) Stochastic control Deterministic control theory ignores time varying disturbances in dynamics Stochastic differential equations models 3
Dynamic programming/PDE methods (1960s) Changes of probability measure-Girsanov
d) Freidlin-Wentzell large deviations theory Small random perturbations, rare events (late 1960s) e) H -infinity control theory (1980s) Disturbances not modeled as stochastic processes, min-max viewpoint 4
Stochastic vs deterministic views of uncertainty v ∈ Ω an ”uncertainty” J ( v ) a “criterion” or “cost” Stochastic view: J a random variable on (Ω , F , P ) Evaluate E [( F ( J )] Nonstochastic view: Evaluate max J ( v ) v 5
Less conservative viewpoint: evaluate [ q ( v ) + J ( v )] = E + ( J ) max v q ( v ) “likelihood” of v q ( v ) ≤ 0 , q ( v 0 ) = 0 6
Connection between stochastic and nonstochastic views F ( J ) = F θ ( J ) = e θJ , θ a risk sensitivity parameter p θ ( v ) probability of v p θ ( v ) ∼ e − θq ( v ) θ →∞ θ − 1 log E e θJ = E + ( J ) � � lim 7
2. Max-plus expectations Max-plus addition and multiplication −∞ ≤ a, b < ∞ a ⊕ b = max( a, b ) a ⊗ b = a + b 8
Maslov idempotent probability calculus Q ( A ) = sup q ( v ) v ∈ A max-plus probability of A ⊂ Ω E + ( J ) = ⊕ v [ q ( v ) ⊗ J ( v )] max-plus expectation of J Max-plus linearity E + ( J 1 ⊕ J 2 ) = E + ( J 1 ) ⊕ E + ( J 2 ) E + ( c ⊗ J ) = c ⊗ E + ( J ) 9
3. Max-plus stochastic differential equations and large deviations Fleming Applied Math. Optimiz. 2004 x ( s ) ∈ R n solution to the ODE dx ( s ) = f ( x ( s )) ds + g ( x ( s )) v ( s ) ds, t ≤ s ≤ T v ( s ) ∈ R d x ( t ) = x, v ( · ) a disturbance control function 10
v ( · ) ∈ Ω = L 2 ([ t, T ]; R d ) � T q ( v ) = − 1 t | v ( s ) | 2 ds 2 J ( v ) = J ( x ( · )) � T J ( x ( · )) − 1 E + [ J ( x ( · ))] = sup t | v ( s ) | 2 ds 2 v ( · ) Example 1: J ( x ( · )) = ℓ ( x ( T )) terminal cost Example 2: J ( x ( · )) = max [ t,T ] ℓ ( x ( s )) max-plus ad- ditive running cost 11
Assumptions: f, g, ℓ ∈ C 1 f x , g, g x , ℓ, ℓ x bounded Connection with large deviations X θ ( s ) solution to the SDE dX θ ( s ) = f ( X θ ( s )) ds + θ − 1 2 g ( X θ ( s )) dw ( s ) , t ≤ s ≤ T X θ ( t ) = x w ( s ) d -dimension Brownian motion 12
In Example 1 θ →∞ θ − 1 log E e θℓ ( X θ ( T )) = E + [ ℓ ( x ( T ))] � � lim In Example 2 � T θ →∞ θ − 1 log E t e θℓ ( X θ ( s )) ds = E + [max lim [ t,T ] ℓ ( x ( s ))] If L = e ℓ , then L θ = e θℓ . 13
4. Max-plus martingales and differential rule Conditional likelihood of v , given A ⊂ Ω q ( v | A ) = q ( v ) − sup q ( ω ) , if v ∈ A ω ∈ A = −∞ if v �∈ A v τ = v | [ t,τ ] � T q ( v | v τ ) = − 1 τ | v ( s ) | 2 ds 2 M ( s ) = M ( s, v s ) is a max-plus martingale if E + [ M ( s ) | v τ ] = M ( τ ) , t ≤ τ < s ≤ T 14
Max-plus differential rule H ( x, p ) = f ( x ) · p + 1 2 | pg ( x ) | 2 , x, p ∈ R n If φ ∈ C 1 b ([0 , T ] × R n ), x ( s ) a solution to the ODE on [ t, T ] with t ≥ 0 dφ ( s, x ( s )) = [ φ t ( s, x ( s )) + H ( x ( s ) , φ x ( s, x ( s ))] ds + dM ( s ) � s ζ ( r ) · v ( r ) − 1 2 | ζ ( r ) | 2 dr M ( s ) = t ζ ( r ) = φ x ( r, x ( r )) g ( x ( r )) M ( s ) is a max-plus martingale 15
Backward PDE φ t + H ( x, φ x ) = 0 If φ satisfies the backward PDE, M ( s ) = φ ( s, x ( s )) is a max-plus martingale. Taking τ = t, s = T φ ( t, x ) = E + tx [ φ ( T, x ( T )] = E + tx [ ℓ ( x ( T ))] 16
5. Dynamic programming PDEs and varia- tional inequalities A) Terminal cost problem: value function W ( t, x ) = E + tx [ ℓ ( x ( T )] Dynamic programming principle � s − 1 τ | v ( r ) | 2 dr + W ( s, x ( s )) W ( τ, x ( τ )) = sup 2 v ( · ) is equivalent to W ( s, x ( s ) a max-plus martingale 17
W is Lipschitz continuous and satisfies the back- ward PDE almost everywhere and in the viscosity sense 0 = W t + H ( x, W x ) , 0 ≤ t ≤ T, x ∈ R n W ( T, x ) = ℓ ( x ) 18
B) Max-plus additive running cost value function � T � � V ( t, x ) = E + t ℓ ( x ( s ) ds ⊕ tx = E + max [ t,T ] ℓ ( x ( s )) tx Since E + tx is max-plus linear [ t,T ] E + V ( t, x ) = max tx [ ℓ ( x ( s ))] Dynamic programming principle � s V ( t, x ) = E + � � ( ⊕ t ℓ ( x ( r )) dr ) ⊕ V ( s, x ( s )) tx 19
V is Lipschitz continuous and satisfies almost ev- erywhere and in viscosity sense 0 = max[ ℓ ( x ) − V ( t, x ) , V t + H ( x, V x )] , 0 ≤ t ≤ T, x ∈ R n V ( T, x ) = ℓ ( x ) Idea of proof: Both terms on right are ≤ 0 Two cases: ℓ ( x ) = V ( t, x ) OK ℓ ( x ) < V ( t, x ) standard control argument 20
Infinite time horizon bounds Take t = 0, T large W ( x ) ∈ C 1 , ℓ ( x ) ≤ W ( x ) , H ( x, W x ( x )) ≤ 0 ⇒ V (0 , x ; T ) ≤ W ( x ) Equivalently: For 0 ≤ s ≤ T , x = x (0) � s ℓ ( x ( s )) ≤ 1 0 | v ( r ) | 2 dr + W ( x ) 2 A nonlinear H -infinity control inequality 21
Example f (0) = 0 , x · f ( x ) ≤ − c | x | 2 , c > 0 0 ≤ ℓ ( x ) ≤ M | x | 2 , W ( x ) = K | x | 2 , M ≤ K, � g � 2 K ≤ c 22
6. Max-plus stochastic control I : terminal cost Fleming-Kaise-Sheu Applied Math Optimiz. 2010 x ( s ) ∈ R n state u ( s ) ∈ U control ( U compact) v ( s ) ∈ R d disturbance control dx ( s ) = f ( x ( s ) , u ( s )) ds + g ( x ( s ) , u ( s )) v ( s ) ds, t ≤ s ≤ T x ( t ) = x 23
Control u ( s ) chosen “depending on v ( · ) past up to s ” Terminal cost criterion: minimize E + tx [ ℓ ( x ( T ))]
Corresponding risk sensitive stochastic control prob- lem: choose a progressively measurable control to minimize e θℓ ( X θ ( T )) � � E tx As θ → ∞ , obtain a two player differential game. Minimizing player chooses u ( s ) Maximizing player chooses v ( s ) 24
Game payoff � T P ( t, x ; u, v ) = − 1 t | v ( s ) | 2 ds + ℓ ( x ( T )) 2 Want the upper differential game value (not the lower value).
Illustrative example (Merton terminal wealth prob- lem) x ( s ) > 0 wealth at time s u ( s ) fraction of wealth in risky asset 1 − u ( s ) fraction of wealth in riskless asset 25
Riskless interest rate = 0 dx ( s ) = x ( s ) u ( s )[ µ + νv ( s )] , t ≤ s ≤ T ds x ( t ) = x f ( x, u ) = µxu g ( x, u ) = νxu 26
Usual terminal wealth problem, parameter θ : choose u ( s ) to minimize e θℓ ( X θ ( T )) � � E tx Take HARA utility, parameter − θ ≪ 0. ℓ ( x ) = − log x, x − θ = e − θ log x � s log x ( s ) = log x + t u ( r )[ µ + νv ( r )] dr � T ˜ P ( t, x ; u, v ) = − log x + P ( u ( r ) , v ( r )) dr t P ( u, v ) = − u ( µ + νv ) − 1 2 v 2 ˜ 27
− µu + 1 2 ν 2 u 2 ˜ min max P ( u, v ) = min u v u = − µ 2 2 ν 2 Minimum when u = u ∗ = µ ν 2 The optimal control is u ( s ) = u ∗ for all s . � = − log x − Λ( T − t ) E + � − log x ∗ ( T ) Λ = µ 2 / 2 ν 2 is the max-plus optimal growth rate 28
Elliott-Kalton upper and lower differential game values Elliott-Kalton strategy α for minimizer (progres- sive strategy) u ( s ) = α [ v ]( s ) v ( r ) = ˜ v ( r ) a.e. in [ t, s ] ⇒ α [ v ]( r ) = α [˜ v ]( r ) a.e. in [ t, s ] Γ EK = { EK strategies α } 29
The lower game value is E + inf tx [ ℓ ( x ( T ))] = inf sup P ( t, x ; α [ v ] , v ) α ∈ Γ EK α ∈ Γ EK v ( · ) We want the upper game value Γ = { EK strategies : α [ v ]( s ) is left continuous with limits on right } α ∈ Γ E + W ( t, x ) = inf tx [ ℓ ( x ( T )] is the upper EK value. It is Lipschitz continuous and satisfies (viscosity sense) the Isaacs PDE 30
u ∈ U H u ( x, W x ) , t ≤ T 0 = W t + min W ( t, x ) = ℓ ( x ) H u ( x, p ) = f ( x, u ) · p + 1 2 | pg ( x, u ) | 2 pg ( x, u ) v − 1 2 | v | 2 = f ( x, u ) · p + max v ∈ R d Recipe for optimal control policy u ∗ ( s, x ( s )) ∈ arg min u ∈ U H u ( x ( s ) , W x ( s, x ( s )))) 31
Merton terminal wealth problem with non-HARA utility H u ( x, p ) = µxup + ν 2 2 x 2 u 2 p 2 H u ( x, p ) = − µ 2 min 2 ν 2 = − Λ u W ( t, x ) = ℓ ( x ) − Λ( T − t ) µ u ∗ ( x ) = − ν 2 ℓ x ( x ) Example : Exponential utility ℓ ( x ) = − x xu ∗ ( x ) = µ ν 2 32
Recommend
More recommend