The HJB-POD approach for infinite dimensional control problems M. Falcone works in collaboration with A. Alla, D. Kalise and S. Volkwein Università di Roma “La Sapienza” Numerical methods for Hamilton-Jacobi equations in optimal control and related fields RICAM, Linz, November 21, 2016 M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 1 / 48
Outline Outline HJ equations, DP schemes and feedback synthesis 1 HJ Equations Numerical scheme for HJ equation Efficient numerical methods for HJ equations 2 A quick overview Accelerated iterative schemes HJB-POD method for high dimensional problem 3 A-priori estimates for the HJB approximation A-priori estimates for the HJB-POD approximation Numerical Tests 4 M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 2 / 48
Outline Introduction The Dynamic Programming Principle allows to derive a first order partial differential equation describing the value function associated to the optimal control problem (in finite or infinite dimension). The theory of viscosity solutions allows to characterize the value function as the unique weak solution of the Bellman equation. This characterization has been used also to construct numerical schemes for the value function and to compute optimal feedbacks. Reference M. Bardi, I. Capuzzo Dolcetta, Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations , 1997. M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 3 / 48
Outline DP’s advantages and disadvantages PROS 1. The characterization of the value function is valid for all classical problems in any dimension. 2. The approximation is based on a-priori error estimates in L ∞ and is valid in any dimension 3. DP (semi-Lagrangian) schemes can work on structured and unstructured grids. 4. The computation of feedbacks is almost built in and there are nice results in low dimension. M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 4 / 48
Outline DP’s advantages and disadvantages PROS 1. The characterization of the value function is valid for all classical problems in any dimension. 2. The approximation is based on a-priori error estimates in L ∞ and is valid in any dimension 3. DP (semi-Lagrangian) schemes can work on structured and unstructured grids. 4. The computation of feedbacks is almost built in and there are nice results in low dimension. CONS The "curse of dimensionality" makes the problem difficult to solve in high dimension due to 1. computational cost 2. huge memory allocations. M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 4 / 48
HJ equations, DP schemes and feedback synthesis Outline HJ equations, DP schemes and feedback synthesis 1 HJ Equations Numerical scheme for HJ equation Efficient numerical methods for HJ equations 2 A quick overview Accelerated iterative schemes HJB-POD method for high dimensional problem 3 A-priori estimates for the HJB approximation A-priori estimates for the HJB-POD approximation Numerical Tests 4 M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 5 / 48
HJ equations, DP schemes and feedback synthesis HJ Equations HJB equation for the infinite horizon problem Controlled Dynamics and Cost Functional � ˙ y ( t ) = f ( y ( t ) , u ( t )) , t ∈ ( t 0 , + ∞ ] y ( t 0 ) = x , Infinite horizon cost functional � + ∞ g ( y ( s ) , u ( s )) e − λ s ds J x ( y , u ) = 0 Value Function v ( x ) := u ( · ) ∈U J x ( y , u ( · )) . inf M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 6 / 48
HJ equations, DP schemes and feedback synthesis Numerical scheme for HJ equation Value and HJB equation: infinite horizon problem Dynamic Programming Principle �� τ � e − λ s g ( y x ( s ) , u ( s )) ds + v ( y x ( τ )) e − λτ v ( x ) = min u ∈U t 0 By Dynamic Programming we get the stationary Bellman equation x ∈ R n λ v ( x ) + max u ∈ U {− f ( x , u ) · ∇ v ( x ) − g ( x , u ) } = 0 , Since the value function is in general not regular we need to use weak solutions, typically Lipschitz continuous. The value function is the unique viscosity solution of the Bellman equation. The construction of the approximation scheme can be obtained via a discrete dynamic programming approach. State constraints can also be included. M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 7 / 48
HJ equations, DP schemes and feedback synthesis Numerical scheme for HJ equation Synthesis of feedback controls The numerical solution of optimal control problems via HJB PDEs leads to the computation of feedback controls for generic nonlinear Lipschitz continuous vectorfields and costs. Solving λ v ( x ) + max u ∈ U {− f ( x , u ) · ∇ v ( x ) − g ( x , u ) } = 0 , x ∈ Ω we get the value function on a grid and we extend it to the whole domain. Then, we can also compute a feedback map u ∗ : Ω → U u ∗ ( x ) ≡ arg min u ∈ U { f ( x , u ) · ∇ v ( x ) + g ( x , u ) } , x ∈ Ω which is used to compute optimal trajectories by an ODE scheme. M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 8 / 48
HJ equations, DP schemes and feedback synthesis Numerical scheme for HJ equation The Zermelo navigation problem 1 6 0.9 5 0.8 0.7 4 0.6 0.5 3 0.4 2 0.3 0.2 1 0.1 0 0 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 � 0.2 � 0.2 � 1 � 1 � 0.8 � 0.8 � 0.4 � 0.6 � 0.4 � 0.6 � 0.4 � 0.4 � 0.2 � 0.2 � 0.6 � 0.6 0 0 0.2 0.2 � 0.8 0.4 � 0.8 0.4 0.6 0.6 0.8 0.8 � 1 � 1 1 1 Value Function Feedback M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 9 / 48
Efficient numerical methods for HJ equations Outline HJ equations, DP schemes and feedback synthesis 1 HJ Equations Numerical scheme for HJ equation Efficient numerical methods for HJ equations 2 A quick overview Accelerated iterative schemes HJB-POD method for high dimensional problem 3 A-priori estimates for the HJB approximation A-priori estimates for the HJB-POD approximation Numerical Tests 4 M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 10 / 48
Efficient numerical methods for HJ equations A quick overview How can we compute the value function? The bottleneck of the DP approach is the computation of the value function, since this requires to solve a non linear PDE in high-dimension. This is a challenging problem due to the huge number of nodes involved and to the singularities of the solution. This goal has motivated new efforts in several directions: Domain Decomposition Fast Marching /Fast Sweeping Methods Accelerated iterative schemes High-order and adaptive grid methods M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 11 / 48
Efficient numerical methods for HJ equations Accelerated iterative schemes Semi-Lagrangian discretization of HJB Dynamic Programming Principle �� τ � e − λ s g ( y x ( s ) , u ( s )) ds + v ( y x ( τ )) e − λτ v ( x ) = min u ∈U t 0 Time-Discrete Approximation via Value Iteration u ∈ U { e − λ ∆ t V k ( x i + ∆ t f ( x i , u )) + ∆ t g ( x i , u ) } V k + 1 = min i Fix a grid in Ω with Ω ⊂ R n bounded, Steps ∆ x . Nodes: { x 1 , . . . , x N } , Discrete solution: V i ≈ v ( x i ) . Stability for large time steps ∆ t M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 12 / 48
Efficient numerical methods for HJ equations Accelerated iterative schemes Semi-Lagrangian discretization of HJB The most standard way to solve this system is the Value Iteration (VI). Fully discrete SL-FEM/ Value Iteration (VI) scheme V k + 1 T ( V k ) , = for i = 1 , . . . , N � � T ( V k ) u ∈ U { e − λ ∆ t I [ V k ]( x i + ∆ t f ( x i , u )) + g ( x i , u ) } ≡ min i Some advantages: Simple to implement (for I = I 1 ,the P 1 interpolation operator) (VI) Converges under very general assumptions. WARNING: Rather expensive in terms of CPU time, since β = e − λ ∆ t the Lipschitz constant of T goes to 1 when ∆ t → 0. M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 13 / 48
Efficient numerical methods for HJ equations Accelerated iterative schemes Semi-Lagrangian discretization of HJB Fully-Discrete Approximation (Value Iteration) � V k � V k + 1 u ∈ U { e − λ ∆ t I = min ( x i + ∆ t f ( x i , u )) + ∆ t L ( x i , u ) } i This algorithm converges for any initial guess V 0 . Error Estimate: [F . 1987] L f ∆ x � v ( x i ) − V i � ≤ C ∆ t 1 / 2 + max ∆ t . λ ( λ − L f ) i ∈ N G N G = number of nodes, L f = Lipschitz constant of the dynamics f . M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 14 / 48
Efficient numerical methods for HJ equations Accelerated iterative schemes Policy iteration An alternative form to solve this problem is the iteration in the policy space [Bellman 1955, Howard 1960]. Fully discrete SL-FEM/ Policy Iteration (PI) scheme Fix u 0 i ∈ U , for i = 1 , . . . , K . 1 Solve ( V k ) i = β I 1 [ V k ]( x i + ∆ t f ( x i , u k i )) + ∆ t g ( x i , u k i ) . 2 Update u k + 1 { I 1 [ V k ]( x i + ∆ t f ( x i , u )) + ∆ t g ( x i , u ) } . = argmin 3 i u ∈ U Repeat until matching convergence criteria (can be set on V or 4 u ).Typically we use || V k + 1 − V k || ∞ < ǫ as stopping rule. Note that in Step 2 the control is frozen. M. Falcone (Università di Roma “La Sapienza”) The HJB-POD approach 15 / 48
Recommend
More recommend