optimal control and dynamic programming
play

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Motivation Last two lectures we discussed two alternative methods to DP for stage decision problems: discretisation and static optimization. Discretization


  1. Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes

  2. Motivation • Last two lectures we discussed two alternative methods to DP for stage decision problems: discretisation and static optimization. • Discretization allows to obtain an optimal policy (approximately) when the dimension of the problem (state and input dimension) is small and static optimisation allows to obtain an optimal path when the problem is convex DP Static dimension convexity discretization optimization small convex n ≤ 3 small non-convex n ≤ 3 large convex large non-convex In this lecture: • we show how to obtain optimal policies using static optimisation and certainty equivalent control (lec 3, slide 33) applying this to solve linear quadratic control with input constraints. • discuss related approximate dynamic programming techniques such as MPC and rollout applying it to a non-convex, large dimensional problem (control of switched linear systems). 1

  3. Outline • Approximate dynamic programming • Linear quadratic control with inequality constraints • Control of switched systems

  4. Challenge Iterating the dynamic programming algorithm for stage decision problems is hard since it is hard to compute the costs-to-go (see e.g. slide 11 of lecture 5). J k ( x k ) = min u k g k ( x k , u k ) + J k +1 ( f k ( x k , u k )) J k ( x k ) J k +1 ( x k +1 ) g k ( x k , u k ) + J k +1 ( f k ( x k , u k )) ? x k x k +1 u k Typically unknown (known Hard to obtain an expression For each hard to x k only for the terminal stage) for the cost-to-go minimize this function of u k • Moreover: - discretization is only possible if the state and input spaces are not large. - static optimisation only assures optimality for convex problems and only allows to compute optimal paths. - How to obtain (sub)optimal policy? 2

  5. Idea I -Certainty equivalent control Compute optimal path online starting from the current state and apply the first decision x k time/stage k h Repeat this procedure at every stage, for the current state x k +1 time/stage k + 1 h 3

  6. Idea II - MPC Similar to Idea I but compute decisions only in a (short) horizon . . Compute optimal path over horizon starting from the current state and apply first decision x k time/stage k k + H h Repeat this procedure in a receding/rolling horizon way x k +1 time/stage k + 1 k + H h • Optimization problem to solve at each stage is then much simpler and makes the algorithm feasible to run online. This is the fundamental idea of Model Predictive Control (MPC). • Note that this is (in general) not optimal not even for problems without disturbances. 4

  7. Idea III - Rollout Similar to Ideas I II but after the optimization horizon use a base policy. Apply the first decision of the optimization procedure. x k time/stage k k + H h Repeat this procedure in a receding/rolling horizon way x k +1 time/stage k + 1 k + H h • Optimization problem to solve at each stage is then much simpler and makes the algorithm feasible to run online. This is the fundamental idea of Rollout. • Note that this is (in general) not optimal not even for problems without disturbances. 5

  8. Approximate dynamic programming Approximate the cost-to go and use the solution of the following equation as the control u k input (in general different from optimal control input!) J k ( x k ) = min u k g k ( x k , u k ) + ˜ J k +1 ( f k ( x k , u k )) known by construction ˜ J k +1 ( x k +1 ) g k ( x k , u k ) + ˜ J k +1 ( f k ( x k , u k )) Typically unknown J k +1 ( x k +1 ) u ∗ u ∗ x k +1 u k k k u k For each state just need to minimize this x k function and compute one action More general than previous ideas! 6

  9. Discussion • Ideas I, II, III can be seen as special cases of approximate dynamic programming. • Idea I - for problems with disturbances, the true cost (requiring the computation of expected values) is approximated by a deterministic cost. For problems without disturbances idea I yields the same policy as DP . • Idea II - approximation is the deterministic cost over a short horizon. This is an approximation even for problems without disturbances. • Idea III - approximation is the cost of the base policy when , and it is the cost H = 1 over a short horizon plus the cost of the base policy if the horizon is larger than one. Again this is an approximation even for problems without disturbances. • Although we mainly consider finite horizon costs the ideas extend naturally to infinite horizon costs. We formalize these statements next. 7

  10. Certainty equivalent control Certainty equivalent control is the following implicit policy for a stage decision problem: P h − 1 x k +1 = f k ( x k , u k , w k ) k =0 g k ( x k , u k ) + g h ( x h ) 1. At each stage (initially ), assuming that the full state is measured solve the k k = 0 x k following problem (for initial condition ), assuming no disturbances ( w k = 0 , w k +1 = 0 , . . . ) x k P h − 1 ` = k g ` ( x ` , u ` ) + g h ( x h ) x ` +1 = f ` ( x ` , u ` , 0) ` ∈ { k, k + 1 , . . . , h − 1 } and obtain estimates of the optimal control inputs u k , u k +1 , . . . , u h − 1 2. Take the first decision and apply it to the process u k (the system will evolve to in general different from x k +1 = f k ( x k , , w k ) x k +1 = f k ( x k , u k , 0) u k due to disturbances). Repeat (go to 1.) u 2 u 1 x 2 x 2 x 2 u h − 1 x 1 u 0 u 1 0 0 h h 1 2 h − 1 1 2 h − 1 3 k = 0 k = 1 8

  11. Certainty equivalent control For problems with no disturbances, this is equivalent to min u k g k ( x k , u k ) + ˜ s.t. x ` +1 = f ` ( x ` , u ` , 0) J k +1 ( x k +1 ) ` ∈ { k, k + 1 , . . . , h − 1 } k + H − 1 ˜ X J k +1 ( x k +1 ) = min g ` ( x ` , u ` ) s.t. (1) u k +1 ,...,u k + H − 1 initial condition x k +1 ` = k +1 and select the first and apply it. u k Note that, for problems without disturbances, are the optimal costs-to-go and satisfy ˜ J k the DP equation ˜ u k g k ( x k , u k ) + ˜ J k ( x k ) = min J k +1 ( x k +1 ) For problems with stochastic disturbances ˜ J k are approximations of optimal costs-to-go J k ( x k ) = min u k g k ( x k , u k ) + E [ J k +1 ( x k +1 )] x ` +1 = f ` ( x ` , u ` , w ` ) ` ∈ { k, k + 1 , . . . , h − 1 } Approximated by ˜ J k +1 ( x k +1 ) 9

  12. Model Predictive Control x k time/stage k k + H h • At each time consider the optimal control problem only in a k horizon H k + H − 1 X min g ` ( x ` , u ` ) x ` +1 = f ` ( x ` , u ` ) s.t. (1) u k ,u h +1 ,...,u k + H − 1 ` ∈ { k, k + 1 , . . . , k + H − 1 } ` = k and select the first control input resulting from this optimization. u k • Equivalent to min u k g k ( x k , u k ) + ˜ J k +1 ( x k +1 ) k + H − 1 ˜ X s.t. (1) J k +1 ( x k +1 ) = min g ` ( x ` , u ` ) ` ∈ { k + 1 , k + 2 , . . . , k + H − 1 } u k +1 ,...,u k + H − 1 10 ` = k +1

  13. Model Predictive Control • Note that at the last decision stages k ∈ { h − H, h − H − 1 , . . . , h } the cost function is slightly different for finite horizon problems, including also the terminal cost k + H − 1 X + g h ( x h ) min g ` ( x ` , u ` ) s.t. (1) u k ,u h +1 ,...,u k + H − 1 ` = k ` ∈ { k, k + 1 , . . . , h − 1 } • There are several variants of Model Predictive Control and in particular some variants use a terminal constraint (this is useful to prove stability). For example, impose that the state after the horizon must be zero (for finite-horizon problems at x k + H = 0 the last stages ). x h = 0 11

  14. Rollout • Similar to MPC but use a base policy after the horizon u k = ¯ µ ( x k ) x k time/stage k k + H h • At each time consider the optimal control problem in a horizon H assuming that after the horizon a base policy is used h − 1 k + H − 1 s.t. (1) X X + g ` ( x ` , ¯ µ ` ( x ` )) + g h ( x h ) min g ` ( x ` , u ` ) u k ,u k +1 ,...,u k + H − 1 ` ∈ { k, k + 1 , . . . , k + H − 1 } ` = k ` = k + H and select the first control input resulting from this optimization. u k • Equivalent to u k g k ( x k , u k ) + ˜ min J k +1 ( x k +1 ) k + H − 1 h − 1 ˜ X X J k +1 ( x k +1 ) = min g ` ( x ` , u ` ) + g ` ( x ` , ¯ µ ` ( x ` )) + g h ( x h ) u k +1 ,...,u k + H − 1 ` = k +1 ` = k + H s.t. (1) ` ∈ { k + 1 , k + 2 , . . . , k + H − 1 } 12

  15. Further remarks on ADP • Quality of an approximation is measured by how “good” is or u ∗ k how close it is from optimal and not by how “good” is the u ∗ k approximation of ˜ J k +1 ( x k +1 ) J k +1 ( x k +1 ) • Decisions only need to be computed (in real-time) for the value of the present state (do not need to iterate the cost-to-go). • There are several variants to approximate the costs-to-go. • Due to the heuristic nature of the approximation, it is very hard to quantify when a specific approximation method is good and to establish formal results. • For example we have seen that the optimal policy for the infinite horizon problem linear quadratic regulator problem makes the closed-loop of a linear system stable. This is typically very hard to establish for approximate methods. 13

  16. Outline • Approximate dynamic programming • Linear quadratic control with inequality constraints • Control of switched systems

Recommend


More recommend