Solving Hamilton-Jacobi-Bellman equations by combining a max-plus - PowerPoint PPT Presentation

Solving Hamilton-Jacobi-Bellman equations by combining a max-plus linear approximation and a probabilistic numerical method Marianne Akian INRIA Saclay - ˆ Ile-de-France and CMAP ´ Ecole polytechnique CNRS RICAM Workshop: Numerical methods for Hamilton-Jacobi equations in optimal control and related fields Linz, November 21-25, 2016 Joint work with Eric Fodjo, see arXiv:1605.02816

A finite horizon diffusion control problem involving “discrete” and “continuum” controls The state ξ s ∈ R d satisfies the stochastic differential equation d ξ s = f µ s ( ξ s , u s ) ds + σ µ s ( ξ s , u s ) dW s , where ( W s ) s ≥ 0 is a d -dimensional Brownian motion, µ := ( µ s ) 0 ≤ s ≤ T , and u := ( u s ) 0 ≤ s ≤ T are admissible control processes, µ s ∈ M a finite set and u s ∈ U ⊂ R p . The problem consists in maximizing the finite horizon discounted payoff ( δ m ≥ 0): �� T � s t δ µτ ( ξ τ , u τ ) d τ ℓ µ s ( ξ s , u s ) ds t e − J ( t , x , µ, u ) := E � T � δ µτ ( ξ τ , u τ ) d τ ψ ( ξ T ) | ξ t = x + e − . t

The Hamilton-Jacobi-Bellman (HJB) equation Define the value function v : [0 , T ] × R d → R as: v ( t , x ) = sup µ, u J ( t , x , µ, u ) . Under suitable assumptions, it is the unique (continuous) viscosity solution of the HJB equation − ∂ v ∂ t − H ( x , v ( t , x ) , Dv ( t , x ) , D 2 v ( t , x )) = 0 , x ∈ R d , t ∈ [0 , T ) , x ∈ R d , v ( T , x ) = ψ ( x ) , satisfying also some growth condition at infinity (in space), where the Hamiltonian H : R d × R × R d × S d → R is given by: m ∈M H m ( x , r , p , Γ) , H ( x , r , p , Γ) := max with � σ m ( x , u ) σ m ( x , u ) T Γ H m ( x , r , p , Γ) := 1 � � max u ∈U 2 tr � + f m ( x , u ) · p − δ m ( x , u ) r + ℓ m ( x , u ) .

Standard grid based discretizations solving HJB equations suffer the curse of dimensionality malediction: for an error of ǫ , the computing time of finite difference or finite element methods is at least in the order of (1 /ǫ ) d / 2 . Some possible curse of dimensionality-free methods: ◮ Idempotent methods introduced by McEneaney (2007) in the deterministic case, and by McEneaney, Kaise and Han (2011) in the stochastic case. ◮ Probabilistic numerical methods based on a backward stochastic differential equation interpretation of the HJB equation, simulations and regressions: ◮ Quantization Bally, Pag` es (2003) for stopping time problems. ◮ Introduction of a new process without control: Bouchard, Touzi (2004) when σ does not depend on control; Cheridito, Soner, Touzi and Victoir (2007) and Fahim, Touzi and Warin (2011) in the fully-nonlinear case. ◮ Control randomization: Kharroubi, Langren´ e, Pham (2013). ◮ Fixed point iterations: Bender, Zhang (2008) for semilinear PDE (which are not HJB equations).

The idempotent method of McEneaney, Kaise and Han ξ m , u the Euler discretization of the process ξ Given m and u , denote by ˆ with time step h : ξ m , u ( t + h ) = ˆ ˆ ξ m , u ( t ) + f m (ˆ ξ m , u ( t ) , u ) h + σ m (ˆ ξ m , u ( t ) , u )( W t + h − W t ) . Define the dynamic programming operators: � �� h ℓ m ( x , u ) + e − h δ m ( x , u ) E φ (ˆ ξ m , u ( t + h )) | ˆ T m ξ m , u ( t ) = x � t , h ( φ )( x ) =sup , u ∈U and m ∈M T m T t , h ( φ )( x ) =max t , h ( φ )( x ) . The HJB equation can be discretized in time by: v h ( t , x ) = T t , h ( v h ( t + h , · ))( x ) , t ∈ T h := { 0 , h , 2 h , . . . , T − h } . Under appropriate assumptions, this scheme converges to the solution of HJB eq. when h goes to zero.

◮ In the deterministic case ( σ m = 0), T m t , h and T t , h are max-plus linear: v h ( t + h , x ) = max i =1 ,..., N ( λ i + q t + h ( x )) ∀ x ⇒ i i = T t , h ( q t + h v h ( t , x ) = max i =1 ,..., N ( λ i + q t i ( x )) ∀ x with q t ) . i ◮ We only need to compute the effect of the dynamic programming operator on the finite basis q T i , i = 1 , . . . , N , for instance by computing their projection on a fixed basis (see Fleming and McEneaney (2000) and A.,Gaubert,Lakoua (2008)). ◮ However, the q T are difficult to compute in general, or the size of the i basis need to be exponential in d . ◮ If T m t , h ( q ) is a quadratic form when q is a quadratic form, and if it easy to compute (for instance when the H m correspond to linear quadratic problems), and if the q T are quadratic forms, then the q t i are finite i suppremum of quadratic forms easy to compute (see McEneaney (2006)). The number of quadratic forms for v h (0 , · ) is exponential in the number of time step only. ◮ This idea was extended to the stochastic case by McEneaney, Kaise and Han (2011).

Theorem (McEneaney, Kaise and Han (2011)) Assume δ m = 0 , σ m is constant, f m is affine, ℓ m is concave quadratic (with respect to ( x , u ) ), and ψ is the supremum of a finite number of concave quadratic forms. Then, for all t ∈ T h , there exists a set Z t and a map g t : R d × Z t → R such that for all z ∈ Z t , g t ( · , z ) is a concave quadratic form and v h ( t , x ) = sup g t ( x , z ) . z ∈ Z t Moreover, the sets Z t satisfy Z t = M × { ¯ z t + h : W → Z t + h | Borel measurable } , where W = R d is the space of values of the Brownian process. ◮ Here a concave quadratic form is any map R d → R of the form x �→ q ( x , z ) := 1 2 x T Qx + b · x + c , with z = ( Q , b , c ) ∈ Q d = S − d × R d × R ◮ The proof uses the max-plus (infinite) distributivity property.

◮ In the deterministic case, the sets Z t are finite, and their cardinality is exponential in time: # Z t = M × # Z t + h = · · · = M N t × # Z T with M = # M and N t = ( T − t ) / h . ◮ In the stochastic case, the sets Z t are infinite as soon as t < T . ◮ If the Brownian process is discretized in space, then W can be replaced by the finite subset with fixed cardinality p , and the sets Z t become finite. ◮ Nevertheless, their cardinality increases doubly exponentially in time: pNt − 1 p − 1 × (# Z T ) p Nt where p ≥ 2 # Z t = M × (# Z t + h ) p = · · · = M ( p = 2 for the Bernouilli discretization). ◮ Then, McEneaney, Kaise and Han proposed to apply a pruning method to reduce at each time step t ∈ T h the cardinality of Z t . ◮ In this talk, we shall replace pruning by random sampling. ◮ The idea is to use only quadratic forms that are optimal in the points of a sample of the process.

Consider the case with no continuous control u and no discount factor. ◮ Then ˆ t , h (ˆ ξ m ( t + h ) = S m ξ m ( t ) , W t + h − W t ) with S m t , h ( x , w ) = x + f m ( x ) h + σ m ( x ) w . and T m t , h ( φ )( x ) = h ℓ m ( x ) + E � φ ( S m � t , h ( x , W t + h − W t )) . d × R d × R , ◮ Assume that φ ( x ) = max z ∈ Z t + h q ( x , z ), Z t + h ⊂ Q d = S − 2 x T Qx + b · x + c , z = ( Q , b , c ) ∈ Q d . and q ( x , z ) := 1 ◮ Then, for each x ∈ R d , there exists ¯ z m x : W → Z t + h measurable s.t. φ ( S m � S m z m � t , h ( x , W t + h − W t )) = q t , h ( x , W t + h − W t ) , ¯ x ( W t + h − W t ) . ◮ Moreover, under the previous assumptions on ℓ m , f m and σ m , we have, for all x ′ ∈ R d , h ℓ m ( x ′ ) + q S m t , h ( x ′ , W t + h − W t ) , ¯ z m = q ( x ′ , z m � � �� E x ( W t + h − W t ) x ) for some z m x ∈ Q d , and so T m t , h ( φ )( x ) = q ( x , z m x ′ ∈ R d q ( x , z m x ) = sup x ′ ) .

The sampling algorithm ◮ Let M = # M and choose N = ( N in , N rg ) giving size of samples. ◮ Choose Z T ⊂ Q d such that | ψ ( x ) − max z ∈ Z T q ( x , z ) | ≤ ǫ . Define v h , N ( T , x ) = max z ∈ Z T q ( x , z ), for x ∈ R d . ◮ Construct a sample of ((ˆ ξ m (0)) m ∈M , ( W t + h − W t ) t ∈T h ) of size N in indexed by ω ∈ Ω N in := { 1 , . . . , N in } , and deduce ˆ ξ m ( t , ω ), m ∈ M . ◮ For t = T − h , T − 2 h , . . . , 0 do: 1. For each ω ∈ Ω N in and m ∈ M , denote x = ˆ ξ m ( t , ω ), and construct a i ) of (Ω N in ) 2 , i ∈ Ω N rg . Let subsample of size N rg of elements ( ω i , ω ′ z m ¯ x : W → Z t + h (as above), be computed at the points ( W t + h − W t )( ω ′ i ) only. Consider q ( x ′ , w ) = h ℓ m ( x ′ ) + q � S m z m � ˜ t , h ( x ′ , w ) , ¯ x ( w ) . Approximate z m x such that q ( x ′ , z m x ) = E [˜ q ( x ′ , W t + h − W t )] by doing a q (ˆ ξ m ( t ) , W t + h − W t ) using a (usual) basis of quadratic regression of ˜ forms of ˆ ξ m ( t ), and the sample (ˆ ξ m ( t , ω i ) , ( W t + h − W t )( ω ′ i )), i ∈ Ω N rg . 2. Let Z t be the set of the parameters z m x ∈ Q d of all the quadratic forms obtained in Step 2. Define v h , N ( t , x ) = max z ∈ Z t q ( x , z ).

Solving Hamilton-Jacobi-Bellman equations by combining a max-plus - PowerPoint PPT Presentation

Solving Hamilton-Jacobi-Bellman equations by combining a max-plus linear approximation and a probabilistic numerical method Marianne Akian INRIA Saclay - Ile-de-France and CMAP Ecole polytechnique CNRS RICAM Workshop: Numerical methods

Solving High Dimensional Hamilton- Jacobi-Bellman Equations Using Low Rank Tensor Decomposition

Suboptimal feedback control of PDEs by solving Hamilton-Jacobi Bellman equations on sparse grids

A near model-free method for solving the Hamilton-Jacobi-Bellman equation in high dimensions

Hamilton-Jacobi-Bellman equations in infinite dimensions Marco Fuhrman Politecnico di Milano

Boundary Approximations for Semi-Lagrangian Schemes Applied to Hamilton-Jacobi-Bellman Equations

A Narrow-Stencil Finite Difference Method for Hamilton-Jacobi-Bellman Equations Xiaobing Feng

Semi-Lagrangian schemes for linear and fully non-linear Hamilton-Jacobi-Bellman equations Kristian

Singular Perturbations in Stochastic Control and Hamilton-Jacobi-Bellman Equation Hicham Kouhkouh

Hamilton-Jacobi-Bellman Equation of an Optimal Consumption Problem Shuenn-Jyi Sheu Institute of

Abstract The Hamilton- Jacobi partial differential equation is generalized to be applicable for

Hamilton-Jacobi Skeleton and Shock Graphs Peihong Zhu University of Utah Papers:

Diffusive Hamilton-Jacobi equations with super-quadratic growth Alessio Porretta University of

Bellman Group Company presentation Introduction to Bellman Group Key facts Sales split by

Bellman GAN: Distributional Multivariate Policy Evaluation and Exploration Dror Freirich, Tzahi

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

7.1 Denis Corr, Ph.D. Denis Corr, Ph.D. Chair Clean Air Hamilton www.cleanair.hamilton.ca

Andreev and Majorana bound states in quantum dots Alfredo Levy Yeyati In collaboration with:

Ou Outline In Introduct ction to RNA RNA-se seq Sa Samp mple preparation Qu

BOOM ROGER S CONRAD CAPITALIST TIMES DC AAII, JANUARY 2015 OUR ADVISORIES: COMPREHENSIVE AND

Investigating Interdomain Routing Policies in the Wild Ruwaifa Anwar 1 , Haseeb Niaz 1 , David

Earnings Conference Call 2 nd Quarter 2009 July 24, 2009 July 24, 2009 Forward-Looking

Wedge: Splitting Applications into Reduced-Privilege Compartments Andrea Bittau Petr Marchenko

Security dangers of the NIST curves Daniel J. Bernstein, Tanja Lange http://xkcd.com/927

19 big enough? What marketing says Generate public keys 56-bit crypto: Broken. on a