optimal control and dynamic programming
play

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Introduction We have seen in the two previous lectures (5, 6) that for stage decision problems with quadratic costs and linear models we can compute analytically


  1. Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes

  2. Introduction • We have seen in the two previous lectures (5, 6) that for stage decision problems with quadratic costs and linear models we can compute analytically the costs-to-go of the DP algorithm and the optimal policy. • However, linear models and quadratic costs is one of the few cases where this happens. In fact, we typically cannot run DP analytically. Example of slide 11, lecture 5: 1 Cost: X x 2 k + u 2 Model: k + g 2 ( x 2 ) g 2 ( x 2 ) = e x 2 x k +1 = x k + u k k =0 • In the next three lectures we will discuss three alternative methods and their pros and cons: • Discretization (lecture 7) • Static optimization (lecture 7 and 8) • Approximate dynamic programming (lecture 9) 1

  3. Outline • Discretization • Introduction to static optimization

  4. Discretization Discrete optimization Stage decision problems problems state and input discretization • Stage decision problems can be approximated by a process called discretization (also denoted by sampling or quantization). 2

  5. Example 1 X x 2 k + u 2 k + g 2 ( x 2 ) k ∈ { 0 , 1 } x k +1 = x k + u k k =0 g 2 ( x 2 ) = e x 2 When we obtained (see lecture 5) the optimal policy and the optimal g 2 ( x 2 ) = x 2 2 costs-to-go u 0 = − 3 u 1 = − 1 5 x 0 2 x 1 J 1 ( x 1 ) = 3 J 0 ( x 0 ) = 8 2 x 2 5 x 2 1 0 We will now recover these functions using an alternative method (discretization) and then apply it to the problem with a non-quadratic terminal cost. 3

  6. Example: discretization 1 Stage-decision X x 2 k + u 2 k + g 2 ( x 2 ) k ∈ { 0 , 1 } x k +1 = x k + u k problem k =0 Discretization u k ∈ { − M, − M + 1 . . . , M } u k = δ ¯ ¯ x k = δ ¯ u k x k x k ∈ { − N, − N + 1 . . . , N } ¯ Discrete optimization δ 2 ( P 1 x 2 u 2 k =0 ¯ k + ¯ k ) + g 2 ( δ ¯ x 2 ) x k +1 = ¯ ¯ x k + ¯ k ∈ { 0 , 1 } u k problem 4

  7. Results When we indeed recover (approximately) the optimal policy g 2 ( x 2 ) = x 2 2 12 40 40 Discretization Discretization Discretization Optimal solution Optimal solution Optimal solution 35 35 10 30 30 J 0 ( x 0 ) = 8 J 1 ( x 1 ) = 3 8 J 2 ( x 2 ) = x 2 5 x 2 2 x 2 25 25 2 0 1 J 0 (x 0 ) J 1 (x 1 ) J 1 (x 1 ) 6 20 20 15 15 4 10 10 2 5 5 0 0 0 -2 -1 0 1 2 -5 0 5 -5 0 5 x 0 x 1 x 1 2.5 1.5 Discretization Discretization 2 Optimal solution Optimal solution N = 100 1 1.5 M = 50 1 0.5 0.5 0 (x 0 ) 1 (x 1 ) 0 0 δ = 0 . 05 -0.5 -0.5 u 0 = − 3 u 1 = − 1 -1 5 x 0 2 x 1 -1.5 -1 -2 -1.5 -2.5 5 -2 -1 0 1 2 -5 0 5 x 0 x 1

  8. Results Which encourages us to think that the optimal policy when can be g 2 ( x 2 ) = e x 2 (approximately) obtained with the same method 12 45 2000 Discretization Discretization Discretization 40 10 35 1500 8 30 25 J 0 (x 0 ) J 1 (x 1 ) J 2 (x 2 ) 6 1000 20 4 15 500 10 2 5 0 0 0 -2 -1 0 1 2 -5 0 5 -6 -4 -2 0 2 4 6 x 0 x 1 x 2 1.5 0 Discretization Discretization 1 -0.5 0.5 N = 100 -1 0 0 (x 0 ) 1 (x 1 ) M = 50 -0.5 -1.5 -1 δ = 0 . 05 -2 -1.5 -2 -2.5 -2 -1 0 1 2 -4 -3 -2 -1 0 1 2 3 4 x 0 x 1 This statement can be formalised (see LaValle’s book) but we do not pursue this here 6

  9. Results Note that the cost if the initial state is x 0 = 1 2 . 64 Discretization 3.5 3 2.5 J 0 (x 0 ) 2 1.5 1 0.5 0 0.8 0.9 1 1.1 1.2 1.3 1.4 x 0 and corresponds to an initial control leading to state x 1 = 0 . 3 u 0 = − 0 . 7 and in turn to the control u 1 = − 0 . 45 Discretization 0.2 Discretization -0.05 0 -0.1 -0.2 -0.15 -0.2 -0.4 0 (x 0 ) 1 (x 1 ) -0.25 -0.6 -0.3 -0.8 -0.35 -1 -0.4 -1.2 -0.45 -1.4 -0.5 0.4 0.6 0.8 1 1.2 1.4 1.6 0 0.1 0.2 0.3 0.4 0.5 0.6 x 0 x 1 We will (approximately) obtain these values later with a different method! 7

  10. Discussion • We discuss next how to extend the discretisation method to the general case where the state belongs to with the help of the following example. R n • Consider the following toy problem: compute the force to move a unitary mass 1 meter along a flat surface from rest to rest in 1 second with minimum energy. z (0) = 0 z (1) = 1 z ( t ) = u ( t ) ¨ R 1 0 u ( t ) 2 dt z (0) = ˙ ˙ z (1) = 0 min • Later in the course we will learn the tools to find an optimal solution to this problem which is , resulting in v ( t ) = 6 t − 6 t 2 x ( t ) = 3 t 2 − 2 t 3 u ( t ) = 6 − 12 t • To convert from continuous to discrete time we also need temporal discretization. 8

  11. Discretization Continuous-time optimal Discrete optimization Stage decision control problems problems problems temporal state and input discretization discretization 9

  12. Outline • Discretization • Digital control and temporal discretization • State and input discretization • Application: minimum energy control of a vehicle • Introduction to static optimization

  13. Digital control Physical system Actuators Sensors x ( t ) = f c ( t, x ( t ) , u ( t )) ˙ Actuation State x ( t ) u ( t ) = u k , t ∈ [ t k , t k +1 ) D/A A/D Sampled Control State decisions Control law / x k = x ( t k ) u k t k := k τ Digital algorithm u k = µ k ( x k ) Sampling period τ Discretization: system ”seen by the controller” x k +1 = f k ( x k , u k ) 10

  14. Temporal discretization • Solve the differential equation in t ∈ [ t k , t k +1 ) x ( t ) = f c ( t, x ( t ) , u ( t )) ˙ for an initial condition and a constant control input x ( t k ) = x k u ( t ) = u k an evaluate at t = t k +1 x k = x ( t k ) x k +1 = x ( t k +1 ) = f ( x k , u k ) t h t 0 t k +1 t k t 1 • For example if , use the variation of constants formula x ( t ) = Ax ( t ) + Bu ( t ) ˙ R t t k e A ( t − s ) Bu ( s ) ds to conclude that x ( t ) = e A ( t − t k ) x ( t k ) + R τ A d = e A τ , B d = 0 e As dsB x k +1 = A d x k + B d u k 11

  15. Double integrator  τ 2  z k +1 �  1 �  z k � � Model  �  �  z ( t ) �  0 � z ( t ) ˙ 0 1 τ = + 2 = + u ( t ) u k 0 1 v ( t ) ˙ 0 0 v ( t ) 1 v k +1 v k τ u ( t ) = u k , t ∈ [ t k , t k +1 ) z k = z ( t k ) v k = v ( t k ) K τ = 1 t k +1 − t k = τ Cost R 1 R t k +1 0 u ( t ) 2 dt = P K − 1 τ P K − 1 u 2 k =0 u 2 k dt k k =0 t k Initial and z (0) = 0 z (1) = 1 z 0 = v 0 = v K = 0 terminal z K = 1 z (0) = ˙ ˙ z (1) = 0 conditions Optimal solution � K u k = − R − 1 B | (( A | ) − 1 ) k +1 λ 0  M 11 �  A − BR − 1 B | ( A | ) − 1 M 12 = (can be obtained ( A | ) − 1 0 M 21 M 22  � 1 λ 0 = M − 1 by LQR) 12 0 12

  16. Solution τ = 0 . 2 6 1 1.5 u k v k x k 4 u(t) v(t) 0.8 x(t) 2 1 0.6 0 0.4 -2 0.5 0.2 -4 -6 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time time time τ = 0 . 1 6 1 2 u k v k x k 4 u(t) v(t) 0.8 x(t) 1.5 2 0.6 0 1 0.4 -2 0.5 0.2 -4 -6 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time time time 13

  17. Solution τ = 0 . 05 6 1 2 u k v k x k 4 u(t) v(t) 0.8 x(t) 1.5 2 0.6 0 1 0.4 -2 0.5 0.2 -4 -6 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time time time τ = 0 . 025 6 1 2 u k v k 4 x k u(t) v(t) 0.8 1.5 x(t) 2 0.6 0 1 0.4 -2 0.5 0.2 -4 -6 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time time time 14

  18. Discussion on temporal discretization Temporal discretization • It is also useful in other contexts and we will exploit this later in the course. Dynamic model • For non-linear systems discretization may be hard. A numerical method (e.g. Euler) is typically used x k +1 = x k + τ f ( t k , x k , u k ) Cost function • In the problem formulation, the state and input variables may already be penalized at sampling times. h − 1 h − 1 X X g k ( x ( t k ) , u ( t k )) + g h ( x ( T )) = g k ( x k , u k ) + g h ( x h ) k =0 k =0 • Cost may result from discretizing a continuous-time cost function Z t k +1 R T 0 g c ( t, x ( t ) , u ( t )) dt = P h − 1 g c ( t, x ( t ) , u ( t )) dt k =0 t k | {z } • g k ( x k ,u k ) We can also use the approximation R T 0 g c ( t, x ( t ) , u ( t )) dt = P h − 1 k =0 τ g c ( k τ , x k , u k ) 15

Recommend


More recommend