A near model-free method for solving the Hamilton-Jacobi-Bellman equation in high dimensions Mathias Oster , Leon Sallandt, Reinhold Schneider Technische Universit¨ at Berlin ICODE Workshop on numerical solutions of HJB equations 10.01.2020
Motivation and Ingredients Aim: Calculate optimal feedback laws (via HJB) for controlled PDEs. Ingredients: 1 Reformulate the HJB equation as operator equation. 2 Use Monte Carlo integration for least squares approximation. 3 Use non linear, smooth Ansatz space: HT/TT – tree-based tensors. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 2 / 21
Classical optimal control problem Optimal control problem: find u ∈ L 2 (0 , ∞ ) such that � ∞ 1 R n + λ 2 � x ( s ) � 2 2 | u ( s ) | 2 ds , min u J ( x , u ) = min u 0 subject to x = f ( x , u ) , ˙ x ∈ Ω ⊂ R x (0) = x 0 1 Note that the differential equation can be high-dimensional 2 linear ODE and quadratic cost → Riccati equation 3 nonlinear ODE and nonlinear cost → Hamilton-Jacobi-Bellman (HJB) equation Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 3 / 21
Feedback control problem Define a feedback-law α ( x ( t )) = u ( t ). Rephrase � ∞ 1 R n + λ 2 � x ( s , α ) � 2 2 | ( α ( x ))( s ) | 2 α J α ( x ) = min min ds , α 0 � �� � =: r α ( x ) Our goal: find an optimal feedback law α ∗ ( x ) = u . Defining the value function α J α ( x ) ∈ R v ( x ) := inf Idea: if v is differentiable, the feedback law is given by α ( x ) = − 1 λ D x v ( x ) ◦ D u f ( x , u ) (easy to calculate!) . Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 4 / 21
The HJB equation The value function obeys � � f ( x , α ( x )) · ∇ v ( x ) + r α ( x ) inf = 0 α HJB equation is highly nonlinear and potentially high-dimensional! But: For fixed policy α ( x ) it reduces to a linear equation: Defining L α := − f ( x , α ) · ∇ we get L α v α ( x ) − r α ( x ) = 0 . Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 5 / 21
Methods of characteristics Linearized HJB: L α v α ( x ) − r α ( x ) = 0 . Using the methods of characteristics we obtain x ( t ) = f ( x , α ) , ˙ � τ v α ( x (0)) = r α ( x ( t )) dt + v α ( x ( s )) , 0 which we call Bellman-like equation. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 6 / 21
Reformulation as Operator Equation Consider the Koopman operator: K α K α τ : L loc , ∞ (Ω) → L loc , ∞ (Ω) , τ [ g ]( x ) = g ( x ( τ )) . Rewrite the Bellman-like equation: For all x ∈ Ω: � τ v α ( x (0)) = r α ( x ( t )) dt + v α ( x ( s )) , 0 as � τ (Id − K α K α τ )[ v ]( x ) = t r ( x ) dt . 0 � �� � =: R α τ ( x ) Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 7 / 21
Policy iteration Policy iteration uses a sequence of linearized HJB equations. Algorithm (Policy iteration) Initialize with stabilizing feedback α 0 . Solve until convergence 1 Find v i +1 such that ( Id − K α i τ ) v i +1 ( · ) − R α i τ ( · ) = 0 . 2 Update policy according to α i +1 ( x ) = − 1 λ D x v i +1 ( x ) ◦ D u f ( t , x , u ) . Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 8 / 21
Least squares ansatz Problem: We need to solve (Id − K α i τ ) v α i +1 ( · ) − R α i τ ( · ) = 0 . Idea : Solve on suitable S τ ( · ) � 2 � (Id − K α i τ ) v ( · ) − R α i v α i +1 = arg min . L 2 (Ω) v ∈ S � �� � Ω | ( Id − K α i α i � τ ( x ) | 2 dx = τ ) v ( x ) − R Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 9 / 21
Projected Policy iteration Algorithm (Projected Policy iteration) Initialize with stabilizing feedback α 0 . Solve until convergence 1 Find τ ( · ) � 2 � ( Id − K α i τ ) v ( · ) − R α i v i +1 = arg min L 2 (Ω) . v ∈ S 2 Update policy according to α i +1 ( x ) = − 1 λ D x v i +1 ( x ) ◦ D u f ( x , u ) . Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 10 / 21
Variational Monte-Carlo Approximate by Monte-Carlo quadrature n L 2 (Ω) ≈ 1 � � (Id − K α i τ ) v ( · ) − R α i τ ( · ) � 2 | (Id − K α i τ ) v ( x j ) − R α i τ ( x j ) | 2 . n j =1 n 1 � v ∗ | (Id − K α i τ ) v ( x j ) − R α i τ ( x j ) | 2 n , s = arg min n v ∈ S j =1 Proposition ([Eigel, Schneider et al, 19) v s ∈ S � v ∗ − v s � 2 ] Let ǫ > 0 such that inf L 2 (Ω) ≤ ǫ . Then P [ � v ∗ − v ∗ ( n , s ) � 2 L 2 (Ω) > ǫ ] ≤ c 1 ( ǫ ) e − c 2 ( ǫ ) n with c 1 , c 2 > 0 . Exponential decay with number of samples chosen. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 11 / 21
Solving the VMC equation n � | (Id − K α i τ ) v ( x j ) − R α i τ ( x j ) | 2 . arg min j =1 1 v ( x j ) → evaluate v at samples x j . 2 K α i τ v ( x j ) → evaluate v at transported samples (with policy α i ). 3 R α i τ ( x j ) → approximate reward by trapezoidal rule What do we need for solving the equation? Model-free solution is possible. Only a black-box solver of the ODE is needed. What do we need for updating the policy? We need D u f ( x , u ), i.e. The derivative of the rhs w.r.t. the control. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 12 / 21
Possible ansatz spaces Full linear space of polynomials Low-rank tensor manifolds Deep Neural Networks Here used: Low rank Tensor Train (TT-tensor) manifold Riemanian manifold structure Explicit representation of tangential space Convergence theory for optimization algorithms Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 13 / 21
Tensor Trains Consider Π i = (1 , x i , x 2 i , x 3 i , .., x k i ) one-dimensional polynomials. Tensor product Π = � n i =1 Π i . dim(Π) = ( k + 1) n , huge if n > > 0. Reduce size of Ansatz space by considering non-linear M ⊂ Π. r 1 r 2 r 3 A 1 A 2 A 3 A 4 v ( x ) = P P P P x 1 x 2 x 3 x 4 Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 14 / 21
Cost functional Modify cost-functional: n R N ( v ) = 1 � | (Id − K α i τ ) v ( x j ) − R ( x j ) | 2 n j =1 + | v (0) | 2 + |∇ v (0) | 2 + µ � v � 2 H 1 (Ω) � �� � � �� � vanishes in exact case regularizer Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 15 / 21
Example: Schloegl-like equation Consider a Schl¨ ogl like system with Neumann boundary condition, c.f. [1, Dolgov, Kalise, Kunisch, 19]. Solve for x ∈ Ω = L 2 ( − 1 , 1) � ∞ 1 2 � x ( s ) � 2 + λ 2 | u ( s ) | 2 ds , min u J ( x , u ) = min u 0 subject to x ( t ) = σ ∆ x ( t ) + x ( t ) 3 + χ ω u ( t ) ˙ x (0) = x 0 . χ ω is characteristic function on ω = [ − 0 . 4 , 0 , 4]. After discretization in space (finite differences): 0 . x 3 . x 1 ˙ x 1 . 1 . . . . . . = A + + u 1 . . . . x 3 . x n ˙ x n . n 0 Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 16 / 21
Example: Schloegl-like equation TT Degrees of Freedom Full space: 5 32 . Reduced to ≈ 5000. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 17 / 21
Example: Schloegl-like equation 5.74 6 2 . 00 x 0 Cost 4 x 1 1 . 75 2.85 2.88 2.83 2.15 2.12 2 1 . 50 1 . 25 0 x 0 x 1 1 . 00 Bellman error squared 10 1 0 . 75 10 0 0 . 50 10 − 1 0 . 25 0 . 00 10 − 2 − 1 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 | v ( x 0 ) − J ( x 0 , α ( x 0 )) | 2 | v ( x 1 ) − J ( x 1 , α ( x 1 )) | 2 (a) Initial values. (b) Generated cost and least squares error. Blue is Riccati, orange is V L 2 and green is V H 1 . Figure: The generated controls for different initial values. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 18 / 21
Example: Schloegl-like equation 0 0 . 0 − 2 . 5 − 5 − 5 . 0 − 7 . 5 − 10 − 10 . 0 − 12 . 5 − 15 Riccati Riccati − 15 . 0 V L 2 V L 2 − 20 V H 1 − 17 . 5 V H 1 0 1 2 3 4 5 0 1 2 3 4 5 time time (a) Generated controls, initial value x 0 (b) Generated controls, initial value x 1 Figure: The generated controls for different initial values. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 19 / 21
What do we need for optimization We only need a discretization of the flow Φ (blackbox) the derivative of the rhs f ( x , u ) w.r.t. the control (easy if linear) the cost functional to solve the equation and generate a feedback law. Thank you for your attention Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 20 / 21
References and related work Sergey Dolgov, Dante Kalise, and Karl Kunisch. A Tensor Decomposition Approach for High-Dimensional Hamilton-Jacobi-Bellman Equations. arXiv e-prints , page arXiv:1908.01533, Aug 2019. Martin Eigel, Reinhold Schneider, Philipp Trunschke, and Sebastian Wolf. Variational monte carlo—bridging concepts of machine learning and high-dimensional partial differential equations. Advances in Computational Mathematics , Oct 2019. Mathias Oster, Leon Sallandt, and Reinhold Schneider. Approximating the stationary hamilton-jacobi-bellman equation by hierarchical tensor products, 2019. Mathias Oster (TU Berlin) Solve HJB in high-dimensions ICODE 21 / 21
Recommend
More recommend