optimal control and dynamic programming
play

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Part III Continuous-time optimal control problems Recap Discrete optimization Stage decision problems problems Dynamic system & Formulation Transition


  1. Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes

  2. Part III Continuous-time optimal control problems

  3. Recap Discrete optimization Stage decision problems problems Dynamic system & Formulation Transition diagram additive cost function Graphical DP algorithm & DP algorithm DP equation DP equation Partial Bayesian inference & decisions Kalman filter and separation information based on prob. distribution principle Alternative Dijkstra's algorithm Static optimization algorithms 1

  4. Goals of part III Introduce optimal control concepts for continuous-time optimal control problems Discrete Stage decision Continuous-time optimization problems control problems problems Discrete-time system & Differential equations & Formulation Transition diagram additive cost function additive cost function DP Graphical DP algorithm Hamilton Jacobi DP equation algorithm & DP equation Bellman equation Bayesian inference & Continuous-time Partial Kalman filter and decisions based on Kalman filter and information separation principle prob. distribution separation principle Alternative Pontryagin’s maximum Dijkstra's algorithm Static optimization algorithms principle And analyze frequency-domain properties of continuous-time LQR/LQG 2

  5. Outline • Problem formulation and approach • Hamilton Jacobi Bellman equation • Linear quadratic regulator

  6. Continuous-time optimal control problems Dynamic model x ( t ) = f ( x ( t ) , u ( t )) , ˙ x (0) = x 0 , t ∈ [0 , T ] Cost function Z T g ( x ( t ) , u ( t )) dt + g T ( x ( T )) 0 Assumptions • The differential equation has a unique solution in t ∈ [0 , T ] • We assume that do not explicitly depend on time for simplicity - we could f, g consider f ( t, x ( t ) , u ( t )) , g ( t, x ( t ) , u ( t )) • and x ( t ) ∈ R n u ( t ) ∈ U ⊆ R m The goal is to find an optimal path and an optimal policy 3

  7. Optimal path • A path consists of a control input and a corresponding ( u ( t ) , x ( t )) , t ∈ [0 , T ] u ( t ) solution to the differential equation x ( t ) x ( t ) = f ( x ( t ) , u ( t )) , ˙ x (0) = x 0 , t ∈ [0 , T ] • A path is said to be optimal is there is no other path with a smaller cost Z T g ( x ( t ) , u ( t )) dt + g T ( x ( T )) 0 • Choosing the control input can be seen as making decisions in infinitesimal time intervals which shape the derivative of the state (and thus determine its evolution) x ( T ) t = T 4

  8. Optimal policy • A policy is a function which maps states into actions at every time step µ u ( t ) = µ ( t, x ( t )) , t ∈ [0 , T ] • A policy is said to be optimal if for every state at every time , x ( t ) = ¯ x t µ Z T g ( x ( s ) , µ ( s, x ( s ))) ds + g T ( x ( T )) t coincides with the cost of the optimal path to the problem x ( s ) = f ( x ( s ) , u ( s )) , ˙ x ( t ) = ¯ s ∈ [ t, T ] x, Z T g ( x ( s ) , u ( s )) ds + g T ( x ( T )) t • We denote the cost of the latter problem by optimal cost-to-go J ( t, ¯ x ) 5

  9. Approach • Dynamic programming (DP) shall allow us to compute optimal policies and optimal paths and the Pontryagin’s maximum principle (PMP) shall allow us to compute optimal paths. • However, obtaining these results in continuous-time (CT) is mathematically involved. • To gain intuition in both cases we will first discretize the problem as a function of the discretization step (previously sampling period), apply DP and take the limit as the discretization step converges to zero. CT Optimal CT DP control path and problem policy Taking the limit Discretization, step τ τ → 0 Stage Optimal DT DP decision path and problem policy 6

  10. Example How to charge the capacitor in a RC circuit with minimum energy loss in the resistor? Z T x (0) = 0 ( x ( t ) − u ( t )) 2 min dt R x ( T ) = x desired u ( t ) 0 R i + 1 x ( t ) = ˙ RC ( u ( t ) − x ( t )) C x + u − − Let us consider R = C = T = x desired = 1 7

  11. Discretization Discretization times discretization step t k = k τ kh = T τ Dynamic model x ( t ) = e − ( t − t k ) x ( t k ) +(1 − e − ( t − t k ) ) u ( t k ) t ∈ [ t k , t k +1 ) | {z } | {z } x k u k x k +1 = e − τ x k + (1 − e − τ ) u k Cost function Z 1 Z t k +1 h − 1 ( x ( t ) − u ( t )) 2 dt = X ( e − ( t − t k ) x k + (1 − e − ( t − t k ) ) u k − u k ) 2 dt 0 t k k =0 Z t k +1 h − 1 X e − 2( t − t k ) dt ( x k − u k ) 2 = t k k =0 h − 1 1 − e − 2 τ X ( x k − u k ) 2 = 2 k =0 8

  12. From terminal constraint to terminal cost The framework of stage decision problems does not take into account terminal constraints. Thus we apply a trick considering that a final control input is applied at the terminal time setting the state to the desired terminal value after seconds, . x (1 + ∆ ) = 1 ∆ x ( t ) 1 time 1 1 + ∆ Since this terminal control input is given by x (1 + ∆ ) = e − ∆ x (1) + (1 − e − ∆ ) u (1) u (1) = 1 − e − ∆ x (1) (1 − e − ∆ ) 9

  13. From terminal constraint to terminal cost The following cost approximates the original one that we are interested in Z 1+ ∆ Z 1 Z 1+ ∆ ( x ( t ) − u ( t )) 2 dt = ( x ( t ) − u ( t )) 2 dt + ( x ( t ) − u ( t )) 2 dt 0 0 1 h − 1 ( x k − u k ) 2 ) + 1 − e − 2 ∆ 1 − e − 2 τ X ( x h − u h ) 2 =( 2 2 1 − e − ∆ x (1) k =0 (1 − e − ∆ ) h − 1 1 − e − 2 τ X ( x k − u k ) 2 ) + γ ( ∆ )( x h − 1) 2 =( 2 k =0 terminal cost 1 − e − 2 ∆ γ ( ∆ ) = 2(1 − e − ∆ ) 2 γ ( ∆ ) → ∞ as Note that but if γ ( ∆ )( x h − 1) 2 → 0 ∆ → 0 x h → 1 10

  14. Dynamic programming Applying DP ( x k − u k ) 2 + J k +1 ( e − τ x k + (1 − e − τ ) u k ) J k ( x k ) = min u k J h ( x h ) = γ ( ∆ )( x h − 1) 2 Results in Obtained from Riccati equations u k = K k x k + α k J k ( x k ) = θ k x 2 k + γ k x k + β k Example 1 2 ∆ = 0 . 01 0.8 1.5 τ = 0 . 2 0.6 x(t) u(t) 1 0.4 0.5 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time t time t 11

  15. Taking the limit τ → 0 1 2 0.8 1.5 ∆ = 0 . 01 0.6 x(t) u(t) 1 τ = 0 . 05 0.4 0.5 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time t time t 1 2 0.8 1.5 ∆ = 0 . 01 0.6 x(t) u(t) τ = 0 . 01 1 0.4 0.5 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time t time t 1 2 ∆ = 0 . 001 0.8 1.5 τ = 0 . 01 0.6 u(t) x(t) 1 0.4 0.5 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time t time t x ( t ) = t . Later we will prove this. Seems to be converging to u ( t ) = 1 + t 12

  16. Static optimization Static optimization problem which can handle constraints h − 1 (1 − e − 2 τ ) X ( x k − u k ) 2 min 2 u 0 ,...,u h − 1 k =0 x k +1 = e − τ x k + (1 − e − τ ) u k k ∈ { 0 , . . . , h − 1 } s.t. x h = 1 x 0 = 0 Lagrangian h − 1 h − 1 (1 − e − 2 τ ) X X ( x k − u k ) 2 + L ( x 1 , u 0 , λ 1 , . . . , x h − 1 , u h − 1 , λ h ) = λ k +1 ( e − τ x k +(1 − e − τ ) u k − x k +1 ) 2 k =0 k =0 Necessary optimality conditions amount to solving a linear system (when ) x 0 = 0 x h = 1 ∂ L λ k = (1 − e − 2 τ )( x k − u k ) + λ k +1 e − τ k ∈ { 1 , . . . , h − 1 } = 0 ∂ x k ∂ L 0 = (1 − e − 2 τ )( x k − u k ) + λ k +1 (1 − e − τ ) = 0 k ∈ { 0 , . . . , h − 1 } ∂ u k ∂ L x k +1 = e − τ x k + (1 − e − τ ) u k = 0 k ∈ { 0 , . . . , h − 1 } ∂λ k +1 13

  17. Taking the limit τ → 0 1 2 0.8 1.5 0.6 x(t) u(t) τ = 0 . 2 1 0.4 0.5 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time t time t 1 2 0.8 1.5 0.6 u(t) x(t) τ = 0 . 05 1 0.4 0.5 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time t time t 1 2 0.8 1.5 0.6 τ = 0 . 01 x(t) u(t) 1 0.4 0.5 0.2 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time t time t Again, seems to be converging to u ( t ) = 1 + t x ( t ) = t 14

  18. Discussion • In this lecture we follow this discretization approach (the more formal continuous-time approach can be found in Bertsekas’ book) to derive the counterpart of DP for continuous-time control problems, which is the Hamilton Jacobi Bellman equation • Later we will use both the discretization approach and the continuous-time approach to derive the Pontryagin’s maximum principle. • With such tools we will be able to establish the optimal solution for charging the capacitor, and solve many other problems. CT PMP CT Optimal control CT DP path and problem policy Taking the limit Discretization, step τ τ → 0 DT PMP Stage Optimal decision path and DT DP problem policy 15

  19. Outline • Problem formulation and approach • Hamilton Jacobi Bellman equation • Linear quadratic regulator

  20. Discretization approach Discretization times discretization step t k = k τ kh = T τ Dynamic model x ( t ) = f ( x ( t ) , u ( t )) , ˙ x (0) = x 0 , t ∈ [0 , T ] x k +1 = x k + τ f ( x k , u k ) x k = x ( k τ ) u k = u ( k τ ) Cost function Z T g ( x ( t ) , u ( t )) dt + g T ( x ( T )) 0 h − 1 X g h ( x ) = g T ( x ) , ∀ x g ( x k , u k ) τ + g h ( x h ) k =0 • Note that these are approximate discretizations. We could have considered exact discretization, as in the linear case, but this approximation will suffice. 16

Recommend


More recommend