Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes
Introduction • We have seen in the two previous lectures (5, 6) that for stage decision problems with quadratic costs and linear models we can compute analytically the costs-to-go of the DP algorithm and the optimal policy. • However, linear models and quadratic costs is one of the few cases where this happens. In fact, we typically cannot run DP analytically. Example of slide 11, lecture 5: 1 Cost: X x 2 k + u 2 Model: k + g 2 ( x 2 ) g 2 ( x 2 ) = e x 2 x k +1 = x k + u k k =0 • In the next three lectures we will discuss three alternative methods and their pros and cons: • Discretization (lecture 7) • Static optimization (lecture 7 and 8) • Approximate dynamic programming (lecture 9) 1
Outline • Discretization • Introduction to static optimization
Discretization Discrete optimization Stage decision problems problems state and input discretization • Stage decision problems can be approximated by a process called discretization (also denoted by sampling or quantization). 2
Example 1 X x 2 k + u 2 k + g 2 ( x 2 ) k ∈ { 0 , 1 } x k +1 = x k + u k k =0 g 2 ( x 2 ) = e x 2 When we obtained (see lecture 5) the optimal policy and the optimal g 2 ( x 2 ) = x 2 2 costs-to-go u 0 = − 3 u 1 = − 1 5 x 0 2 x 1 J 1 ( x 1 ) = 3 J 0 ( x 0 ) = 8 2 x 2 5 x 2 1 0 We will now recover these functions using an alternative method (discretization) and then apply it to the problem with a non-quadratic terminal cost. 3
Example: discretization 1 Stage-decision X x 2 k + u 2 k + g 2 ( x 2 ) k ∈ { 0 , 1 } x k +1 = x k + u k problem k =0 Discretization u k ∈ { − M, − M + 1 . . . , M } u k = δ ¯ ¯ x k = δ ¯ u k x k x k ∈ { − N, − N + 1 . . . , N } ¯ Discrete optimization δ 2 ( P 1 x 2 u 2 k =0 ¯ k + ¯ k ) + g 2 ( δ ¯ x 2 ) x k +1 = ¯ ¯ x k + ¯ k ∈ { 0 , 1 } u k problem 4
Results When we indeed recover (approximately) the optimal policy g 2 ( x 2 ) = x 2 2 12 40 40 Discretization Discretization Discretization Optimal solution Optimal solution Optimal solution 35 35 10 30 30 J 0 ( x 0 ) = 8 J 1 ( x 1 ) = 3 8 J 2 ( x 2 ) = x 2 5 x 2 2 x 2 25 25 2 0 1 J 0 (x 0 ) J 1 (x 1 ) J 1 (x 1 ) 6 20 20 15 15 4 10 10 2 5 5 0 0 0 -2 -1 0 1 2 -5 0 5 -5 0 5 x 0 x 1 x 1 2.5 1.5 Discretization Discretization 2 Optimal solution Optimal solution N = 100 1 1.5 M = 50 1 0.5 0.5 0 (x 0 ) 1 (x 1 ) 0 0 δ = 0 . 05 -0.5 -0.5 u 0 = − 3 u 1 = − 1 -1 5 x 0 2 x 1 -1.5 -1 -2 -1.5 -2.5 5 -2 -1 0 1 2 -5 0 5 x 0 x 1
Results Which encourages us to think that the optimal policy when can be g 2 ( x 2 ) = e x 2 (approximately) obtained with the same method 12 45 2000 Discretization Discretization Discretization 40 10 35 1500 8 30 25 J 0 (x 0 ) J 1 (x 1 ) J 2 (x 2 ) 6 1000 20 4 15 500 10 2 5 0 0 0 -2 -1 0 1 2 -5 0 5 -6 -4 -2 0 2 4 6 x 0 x 1 x 2 1.5 0 Discretization Discretization 1 -0.5 0.5 N = 100 -1 0 0 (x 0 ) 1 (x 1 ) M = 50 -0.5 -1.5 -1 δ = 0 . 05 -2 -1.5 -2 -2.5 -2 -1 0 1 2 -4 -3 -2 -1 0 1 2 3 4 x 0 x 1 This statement can be formalised (see LaValle’s book) but we do not pursue this here 6
Results Note that the cost if the initial state is x 0 = 1 2 . 64 Discretization 3.5 3 2.5 J 0 (x 0 ) 2 1.5 1 0.5 0 0.8 0.9 1 1.1 1.2 1.3 1.4 x 0 and corresponds to an initial control leading to state x 1 = 0 . 3 u 0 = − 0 . 7 and in turn to the control u 1 = − 0 . 45 Discretization 0.2 Discretization -0.05 0 -0.1 -0.2 -0.15 -0.2 -0.4 0 (x 0 ) 1 (x 1 ) -0.25 -0.6 -0.3 -0.8 -0.35 -1 -0.4 -1.2 -0.45 -1.4 -0.5 0.4 0.6 0.8 1 1.2 1.4 1.6 0 0.1 0.2 0.3 0.4 0.5 0.6 x 0 x 1 We will (approximately) obtain these values later with a different method! 7
Discussion • We discuss next how to extend the discretisation method to the general case where the state belongs to with the help of the following example. R n • Consider the following toy problem: compute the force to move a unitary mass 1 meter along a flat surface from rest to rest in 1 second with minimum energy. z (0) = 0 z (1) = 1 z ( t ) = u ( t ) ¨ R 1 0 u ( t ) 2 dt z (0) = ˙ ˙ z (1) = 0 min • Later in the course we will learn the tools to find an optimal solution to this problem which is , resulting in v ( t ) = 6 t − 6 t 2 x ( t ) = 3 t 2 − 2 t 3 u ( t ) = 6 − 12 t • To convert from continuous to discrete time we also need temporal discretization. 8
Discretization Continuous-time optimal Discrete optimization Stage decision control problems problems problems temporal state and input discretization discretization 9
Outline • Discretization • Digital control and temporal discretization • State and input discretization • Application: minimum energy control of a vehicle • Introduction to static optimization
Digital control Physical system Actuators Sensors x ( t ) = f c ( t, x ( t ) , u ( t )) ˙ Actuation State x ( t ) u ( t ) = u k , t ∈ [ t k , t k +1 ) D/A A/D Sampled Control State decisions Control law / x k = x ( t k ) u k t k := k τ Digital algorithm u k = µ k ( x k ) Sampling period τ Discretization: system ”seen by the controller” x k +1 = f k ( x k , u k ) 10
Temporal discretization • Solve the differential equation in t ∈ [ t k , t k +1 ) x ( t ) = f c ( t, x ( t ) , u ( t )) ˙ for an initial condition and a constant control input x ( t k ) = x k u ( t ) = u k an evaluate at t = t k +1 x k = x ( t k ) x k +1 = x ( t k +1 ) = f ( x k , u k ) t h t 0 t k +1 t k t 1 • For example if , use the variation of constants formula x ( t ) = Ax ( t ) + Bu ( t ) ˙ R t t k e A ( t − s ) Bu ( s ) ds to conclude that x ( t ) = e A ( t − t k ) x ( t k ) + R τ A d = e A τ , B d = 0 e As dsB x k +1 = A d x k + B d u k 11
Double integrator τ 2 z k +1 � 1 � z k � � Model � � z ( t ) � 0 � z ( t ) ˙ 0 1 τ = + 2 = + u ( t ) u k 0 1 v ( t ) ˙ 0 0 v ( t ) 1 v k +1 v k τ u ( t ) = u k , t ∈ [ t k , t k +1 ) z k = z ( t k ) v k = v ( t k ) K τ = 1 t k +1 − t k = τ Cost R 1 R t k +1 0 u ( t ) 2 dt = P K − 1 τ P K − 1 u 2 k =0 u 2 k dt k k =0 t k Initial and z (0) = 0 z (1) = 1 z 0 = v 0 = v K = 0 terminal z K = 1 z (0) = ˙ ˙ z (1) = 0 conditions Optimal solution � K u k = − R − 1 B | (( A | ) − 1 ) k +1 λ 0 M 11 � A − BR − 1 B | ( A | ) − 1 M 12 = (can be obtained ( A | ) − 1 0 M 21 M 22 � 1 λ 0 = M − 1 by LQR) 12 0 12
Solution τ = 0 . 2 6 1 1.5 u k v k x k 4 u(t) v(t) 0.8 x(t) 2 1 0.6 0 0.4 -2 0.5 0.2 -4 -6 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time time time τ = 0 . 1 6 1 2 u k v k x k 4 u(t) v(t) 0.8 x(t) 1.5 2 0.6 0 1 0.4 -2 0.5 0.2 -4 -6 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time time time 13
Solution τ = 0 . 05 6 1 2 u k v k x k 4 u(t) v(t) 0.8 x(t) 1.5 2 0.6 0 1 0.4 -2 0.5 0.2 -4 -6 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time time time τ = 0 . 025 6 1 2 u k v k 4 x k u(t) v(t) 0.8 1.5 x(t) 2 0.6 0 1 0.4 -2 0.5 0.2 -4 -6 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 time time time 14
Discussion on temporal discretization Temporal discretization • It is also useful in other contexts and we will exploit this later in the course. Dynamic model • For non-linear systems discretization may be hard. A numerical method (e.g. Euler) is typically used x k +1 = x k + τ f ( t k , x k , u k ) Cost function • In the problem formulation, the state and input variables may already be penalized at sampling times. h − 1 h − 1 X X g k ( x ( t k ) , u ( t k )) + g h ( x ( T )) = g k ( x k , u k ) + g h ( x h ) k =0 k =0 • Cost may result from discretizing a continuous-time cost function Z t k +1 R T 0 g c ( t, x ( t ) , u ( t )) dt = P h − 1 g c ( t, x ( t ) , u ( t )) dt k =0 t k | {z } • g k ( x k ,u k ) We can also use the approximation R T 0 g c ( t, x ( t ) , u ( t )) dt = P h − 1 k =0 τ g c ( k τ , x k , u k ) 15
Recommend
More recommend