time inconsistent optimal control and mean variance
play

Time Inconsistent Optimal Control and Mean Variance Optimization - PDF document

Time Inconsistent Optimal Control and Mean Variance Optimization Tomas Bj ork Stockholm School of Economics Agatha Murgoci Copenhagen Business School Xunyu Zhou Oxford University Conference in honour of Walter Schachermayer Wien 2010


  1. Time Inconsistent Optimal Control and Mean Variance Optimization Tomas Bj¨ ork Stockholm School of Economics Agatha Murgoci Copenhagen Business School Xunyu Zhou Oxford University Conference in honour of Walter Schachermayer Wien 2010 – Typeset by Foil T EX – 1

  2. Contents • Recap of DynP. • Problem formulation. • Discrete time. • Continuous time. • Example: Dynamic mean-variance optimization. – Typeset by Foil T EX – 2

  3. Standard problem We are standing at time t = 0 in state X 0 = x 0 . �� T � max E h ( s, X s , u s ) dt + F ( X T ) u 0 dX t = µ ( t, X t , u t ) dt + σ ( t, X t , u t ) dW t For simplicty we assume that • X is scalar. • The adapted control u t is scalar with no restrictions. We denote this problem by P We restrict ourselves to feedback controls of the form u t = u ( t, X t ) . – Typeset by Foil T EX – 3

  4. Dynamic Programming We embed the problem P in a family of problems P tx P tx : �� T � max E t,x h ( s, X s , u s ) dt + F ( X T ) u t dX s = µ ( t, X s , u s ) ds + σ ( s, X s , u s ) dW s , X t = x The original problem corresponds to P 0 ,x 0 . – Typeset by Foil T EX – 4

  5. Bellman We now have the Bellman optimality principle , which says that the family {P t,x ; t ≥ 0 , x ∈ R } are time consistent . More precisely: If ˆ u is optimal on the time interval [ t, T ] , then it is also optimal on the sub-interval [ s, T ] for every s with t ≤ s ≤ T . We can easily derive the Hamilton-Jacobi-Bellman equation HJB: � h ( t, x, u ) + µ ( t, x, u ) V x ( t, x ) + 1 � 2 σ 2 ( t, x, u ) V xx ( t, x ) V t ( t, x ) + sup = 0 , u V ( T, x ) = F ( x ) – Typeset by Foil T EX – 5

  6. Three Disturbing Examples Hyperbolic discounting (Ekeland-Lazrak-Pirvu) �� T � max E t,x ϕ ( s − t ) h ( c s ) ds + ϕ ( T − t ) F ( X T ) u t Mean variance utility (Basak-Chabakauri) E t,x [ X T ] − γ max 2 V ar t,x ( X T ) u Endogenous habit formation � � X T �� max E t,x ln x − β u dX t = [ rX t + ( α − r ) u t ] dt + σu t dW t – Typeset by Foil T EX – 6

  7. Moral • These types of problems are not time consistent. • We cannot use DynP. • In fact, in these cases it is unclear what we mean by “optimality”. Possible ways out: • Easy way: Dismiss the problem as being silly. • Pre-commitment: Solve (somehow) the problem P 0 ,x 0 and ignore the fact that later on, your “optimal” control will no longer be viewed as optimal. • Game theory: Take the time inconsistency seriously. View the problems as a game and look for a Nash equilibrium point. We use the game theoretic approach. – Typeset by Foil T EX – 7

  8. Our Basic Problem � � max E t,x [ F ( x, X T )] + G x, E t,x [ X T ] u dX s = µ ( X s , u s ) ds + σ ( X s , u s ) dW s , X t = x This can be extended considerably. For simplicity we will consider the easier problem � � max E t,x [ F ( X T )] + G E t,x [ X T ] u – Typeset by Foil T EX – 8

  9. The Game Theoretic Approach • This is a bit delicate to formalize in continuous time. • Thus we turn to discrete time, and then go to the limit. – Typeset by Foil T EX – 9

  10. Discrete Time Given: A controlled Markov process { X n : n = 0 , 1 , . . . T } At any time n we can change the transition probabilities for X n → X n +1 by choosing a control value u ∈ R . Players: • For each point in time n there is a player – “player No n ” or “ P n ”. • P n chooses the feedback control law u n ( X n ) . • A sequence of control laws u 0 , . . . , u T − 1 is denoted by u . • Given a sequence u of control laws, the value function for P n is defined by J n ( x, u ) = E n , x [ F ( X u E n , x [ X u � � T )] + G T ] – Typeset by Foil T EX – 10

  11. Subgame Perfect Nash Equilibrium The value function for P n was defined by J n ( x, u ) = E n,x [ F ( X u E n,x [ X u � � T )] + G T ] We see that J n ( x, u ) depends on ( n, x ) and u n , u n +1 , . . . , u T − 1 . Definition: • The control law ˆ u is an equilibrium strategy if the following hold for each fixed n . – Assume that P k use ˆ u k ( · ) for k = n +1 , . . . , T − 1 . – Then it is optimal for player No n to use ˆ u n ( · ) . • The equilibrium value function is defined by V n ( x ) = J n ( x, ˆ u ) – Typeset by Foil T EX – 11

  12. The infinitesimal operator Let { f n ( · ) } T n =0 be a sequence of real valued functions. Def: For a fixed control value u ∈ R , the infinitesimal operator A u , is defined by ( A u f ) n ( x ) = E [ f n +1 ( X n +1 ) − f n ( x ) | X n = x, u n = u ] Def: For a fixed control law u , the A u , is defined by ( A u f ) n ( x ) = E [ f n +1 ( X n +1 ) − f n ( x ) | X n = x, u n = u n ( x )] – Typeset by Foil T EX – 12

  13. Important Idea It turns out that a fundamental role is played by the function sequence f n defined by X ˆ u � � f n ( x ) = E n,x T where ˆ u is the equilibrium strategy. The process f n ( X n ) is of course a martingale under the equilibrium control ˆ u so we have A ˆ u f n ( x ) = 0 , f T ( x ) = x. – Typeset by Foil T EX – 13

  14. Extending HJB Proposition: The equilibrium value function V n ( x ) and the function f n ( x ) satisfy the system u { A u V n ( x ) − A u ( G ◦ f ) n ( x ) + ( H u f ) n ( x ) } sup = 0 , V T ( x ) = F ( x ) + G ( x ) A ˆ u f n ( x ) = 0 , f T ( x ) = x. ( H u f ) n ( x ) = G X u X ˆ u � � � ��� � � E n,x f n +1 − G ( f n ( x )) , f n ( x ) = E n,x n +1 T – Typeset by Foil T EX – 14

  15. Continuous Time The discrete time results extend immediately to continuous time. • Now X is a controlled continuous time Markov process with controlled infinitesimal generator 1 A u g ( t, x ) = lim g ( t + h, X u � � � � E t,x t + h ) − g ( t, x ) h h → 0 • The extended HJB is now an equation with time step [ t, t + h ] . • Divide the discrete time HJB equations by h and let h → 0 . – Typeset by Foil T EX – 15

  16. Extended HJB Continuous Time Conjecture: The equilibrium value function satisfies the system u { A u V ( t, x ) − A u ( G ◦ f ) ( t, x ) + G ′ ( f ( t, x )) · A u f ( t, x ) } sup = 0 , A ˆ u f ( t, x ) = 0 , V ( T, x ) = F ( x ) + G ( x ) f ( T, x ) = x. Note the fixed point character of the extended HJB. – Typeset by Foil T EX – 16

  17. General Problem �� T � � � max E t,x C ( t, x, X s , u s ) ds + F ( t, x, X T ) + G t, x, E t,x [ X T ] u t – Typeset by Foil T EX – 17

  18. The general case Z T Z T A uV ( A ucs ) t ( x, x ) ds + ( A ucs,x ) t ( x ) ds “ ” sup { ( t, x ) + C ( x, x, u ) − u ∈U t t A uf A ufx ” ( t, x ) − A u ( G ⋄ g ) ( t, x ) + H ug “ ” “ “ ” − ( t, x, x ) + ( t, x ) } = 0 , A ˆ u fy ( t, x ) = 0 , A ˆ u g ( t, x ) = 0 , ( A ˆ u cs,y ) t ( x ) = 0 , 0 ≤ t ≤ s V ( T, x ) = F ( x, x ) + G ( x, x ) , cs,y C ( x, y, ˆ ( x ) = u s ( x )) , s f ( T, x, y ) = F ( y, x ) , g ( T, x ) = x. – Typeset by Foil T EX – 18

  19. Optimal for what? • In continuous time, it is not immediately clear how to define an equilibrium strategy. • We follow Ekeland et al and define the equilibrium using spike variations. – Typeset by Foil T EX – 19

  20. HJB as a Necessary Condition Conjecture: Assume that there exists and equilibrium control ˆ u and define V and f as above. The V and f satisfies the extended HJB system. Note: It is probably very hard to prove this, due to technical problems. We do however have a converse result. – Typeset by Foil T EX – 20

  21. Verification Theorem Assume that V , f and ˆ Theorem: u satisfies the extended HJB system. Then V is the equilibrium value function and ˆ u is the equilibrium control. Proof: Not very hard, but a bit harder than for standard DynP. – Typeset by Foil T EX – 21

  22. A useful Lemma Consider a functional J ( t, x, u ) = E t,x [ F ( x, X u T )] + G ( x, E t,x [ X u T ]) . and denote the equilibrium control and value function by ˆ u and V respectively. Let ϕ ( x () be a given deterministic real valued function and consider the functional J ϕ ( t, x, u ) = ϕ ( x ) { E t,x [ F ( x, X u T )] + G ( x, E t,x [ X u T ]) } Denoting the equilibrium control and value function by u ϕ and V ϕ respectively we have ˆ u ϕ ( t, x ) ˆ = u ( t, x ) , ˆ V ϕ ( t, x ) = ϕ ( x ) V ( t, x ) – Typeset by Foil T EX – 22

  23. Practical handling of the theory • Make a parameterized Ansatz for V . • Make a parameterized Ansatz for f . • Plug everything into the extended HJB system and hope to obtain a system of ODEs for the parameters in the Ansatz. • Alternatively, compute Lie symmetry groups. – Typeset by Foil T EX – 23

  24. Basak’s Example (in a simple version) dS t = αS t dt + σS t dW t , dB t = rB t dt X t = portfolio value process u = amount of money invested in risky asset Problem: E t,x [ X T ] − γ max 2 V ar t,x ( X T ) u dX t = [ rX t + ( α − r ) u t ] dt + σu t dW t This corresponds to our standard problem with F ( x ) = x − γ G ( x ) = γ 2 x 2 , 2 x 2 – Typeset by Foil T EX – 24

  25. Extended HJB � � [ rX t + ( α − r ) u ] V x + 1 2 σ 2 u 2 V xx − γ 2 σ 2 u 2 f 2 V t + sup = 0 x u V ( T, x ) = x A ˆ u f = 0 f ( T, x ) = x Ansatz: V ( t, x ) = g ( t ) x + h ( t ) f ( t, x ) = A ( t ) x + B ( t ) – Typeset by Foil T EX – 25

Recommend


More recommend