optimal control and dynamic programming
play

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte - PowerPoint PPT Presentation

Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes Outline Stochastic dynamic programming and linear quadratic control Output feedback linear quadratic control Separation principle Kalman filter LQG


  1. Optimal Control and Dynamic Programming 4SC000 Q2 2017-2018 Duarte Antunes

  2. Outline • Stochastic dynamic programming and linear quadratic control • Output feedback linear quadratic control • Separation principle • Kalman filter • LQG design

  3. Stochastic formulation Stochastic disturbances w k ∈ R n w Dynamic model x k +1 = f k ( x k , u k , w k ) h − 1 X Cost g k ( x k , u k , w k ) + g h ( x h ) , k =0 Find a policy that minimizes u k = µ k ( x k ) π = { µ 0 , . . . , µ h − 1 } J π ( x 0 ) = E [ P h − 1 k =0 g k ( x k , µ k ( x k ) , w k ) + g h ( x h )] We assume that the stochastic disturbances have zero mean and are statistically independent (white) 1

  4. Stochastic dynamic programming algorithm Start with for every and for each decision stage, starting J h ( x h ) = g h ( x h ) x h ∈ X h from the last and moving backwards, , compute and from k ∈ { h − 1 , h − 2 , . . . , 0 } J k µ k (DP equation) J k ( x k ) = u k ∈ U k ( x k ) E w k [ g k ( x k , u k , w k ) + J k +1 ( f k ( x k , u k , w k ))] min and µ k ( x k ) = u k where is the minimizer in the DP equation. u k Then is an optimal policy. { µ 0 , . . . , µ h − 1 } To be more precise we can write on the right hand side of the DP equation E w k [ g k ( x k , u k , w k ) + J k +1 ( f k ( x k , u k , w k )) | x k ] which reinforces the fact that is assumed to be constant while computing the expected x k value and taking the min. 2

  5. Linear quadratic control zero mean E [ ω k ] = 0 and white Dynamic model x k +1 = A k x k + B k u k + ω k k ∈ { 0 , . . . , h − 1 } h − 1 ⇤  Q k �  x k � S k Cost function X ⇥ x | u | + x | h Q h x h k k S | R k u k k k =0 Find a policy which minimizes u k = µ k ( x k ) π = { µ 0 , . . . , µ h − 1 } h − 1 ⇤  Q k �  x k � S k X ⇥ x | u | + x | E [ h Q h x h ] k k S | R k u k k k =0 3

  6. Optimal policy Theorem: The optimal policy is u k = K k x k k ∈ { 0 , . . . , h − 1 } where � − 1 � � � B | S | k + B | K k = − k P k +1 B k + R k k P k +1 A k � − 1 ( S | � P k = A | k P k +1 A k + Q k − ( S k + A | B | k + B | k P k +1 B k ) k P k +1 B k + R k k P k +1 A k ) P h = Q h k ∈ { h − 1 , . . . , 0 } (Riccati iterations) and the resulting expected cost is given by 0 P 0 x 0 + P h − 1 J 0 ( x 0 ) = x | k =0 trace( P k +1 E [ w k w | k ]) 4

  7. Discussion • The optimal policy is exactly the same as in the case w k = 0 and to obtain the expected cost we just need to add a constant. • Recall the approaches to cope with stochastic disturbances • open loop: use the decisions of the optimal path • closed-loop: use the policy from deterministic DP • closed-loop: use the policy from stochastic DP • The result of the previous slide says that when the dynamic model is linear and the cost is quadratic, the deterministic DP and stochastic DP yield the same policy. • When this happens we say that we have certainty equivalence 5

  8. Discussion • The result carries through if we consider an infinite horizon cost, under the same assumptions that we have considered in the deterministic case ∞ X x | k Qx k + 2 x | k Su k + u | E [ k Ru k ] k =0 � − 1 � (optimal policy B | ¯ S | + B | ¯ u k = ¯ ¯ � � Kx k K = − PB + R PA where satisfies the algebraic Riccati equation.) ¯ P • However, now we need the extra assumption that the cost is bounded. If this is the case (e.g. if ) it is given by w ` = 0 , ` > L > 0 0 ¯ k =0 trace( ¯ Px 0 + P ∞ x | P E [ w k w | k ]) • If the cost is not bounded we can consider an alternative (average) cost T − 1 1 X x | k Qx k + 2 x | k Su k + u | lim T E [ k Ru k ] T →∞ k =0 trace( ¯ which converges to , assuming PW ) W = E [ w k w | k ] ∀ k 6

  9. Example Double integrator considered previously  τ 2  1 � � τ x k +1 = x k + 2 τ = 0 . 2 u k 0 1 τ  � 1 0 P h − 1 k =0 ( x | k Qx k + u | k Ru k ) + x | h Q h x h Q = R = 1 0 1 Optimal policy ⇥ − 0 . 8412 − 1 . 54 ⇤ u k = Kx k , k ≥ 0 K = x 0 = [ y 0 v 0 ] | = [1 0] | Optimal control input and states for 1 0.1 0.2 0 0 0.8 -0.1 -0.2 0.6 y(t) v(t) u(t) -0.2 -0.4 0.4 -0.3 -0.6 0.2 -0.4 -0.8 0 -0.5 -1 0 5 10 0 5 10 0 5 10 7 t t t

  10. Effects of disturbances Consider now a disturbance at time ( k = 10) 2 sec  0 � 8  τ 2  1 � � , if k ∈ N 0 \{ 10 } > τ 0 > < x k +1 = x k + 2 + w k u k w k = 0 1  0 � τ > if k = 10 > : a where is a uniform random variable in the interval [ − 0 . 5 0 . 5] a 1 0 0.2 0.5 0 -0.1 Open loop 0 -0.2 -0.2 y(t) v(t) u(t) -0.5 -0.4 -0.3 -1 -0.6 -0.4 -1.5 -0.8 -2 -0.5 -1 0 10 20 0 10 20 0 10 20 t t t Closed loop 1 0.1 0.5 0.8 0 0.6 -0.1 0 y(t) v(t) u(t) 0.4 -0.2 0.2 -0.3 -0.5 0 -0.4 -0.2 -0.5 -1 8 0 10 20 0 10 20 0 10 20 t t t

  11. Computing the expected cost We can compute the expected cost based on the expression  0 � 0 0 ¯ k =0 trace( ¯ 0 ¯ Px 0 + trace( ¯ Px 0 + P ∞ x | P E [ w k w | k ]) = x | ) P E [ a 2 ] 0 where is the solution the the algebraic Riccati equation which for the considered ¯ P parameters is  � 9 . 1890 5 . 0249 ¯ P = 5 . 0249 9 . 2324 and we used the fact that  � 0 8 , if k ∈ N 0 \{ 10 } > 0 > < w k =  � 0 > if k = 10 > : a Z 0 . 5 E [ a 2 ] = t 2 dt = 1 / 12 x 0 = [0 1] | Since, and we obtain the cost . 9 . 95 − 0 . 5 9

  12. Computing the expected value Alternatively we can simulate the system several times computing the cost for each simulation and then averaging (Monte Carlo method) Simulation 1 1 0.1 0.5 0.5 0 0 -0.1 0 u(t) y(t) v(t) -0.5 -0.2 9 . 534 -1 -0.3 -0.5 -1.5 -0.4 -2 -0.5 -1 0 10 20 0 10 20 0 10 20 t t t Simulation 2 1 0.1 0.2 9 . 097 0.5 0 0 0 -0.1 -0.2 y(t) v(t) u(t) -0.5 -0.2 -0.4 -1 -0.3 -0.6 -1.5 -0.4 -0.8 -2 -0.5 -1 0 10 20 0 10 20 0 10 20 t t t average cost of 5000 simulations: 9 . 95 10

  13. Outline • Stochastic dynamic programming and linear quadratic control • Output feedback linear quadratic control • Separation principle • Kalman filter • LQG design

  14. Problem formulation Stochastic disturbances Dynamic model x k +1 = f k ( x k , u k , w k ) initial state unknown h − 1 X Cost g k ( x k , u k , w k ) + g h ( x h ) , noise k =0 Output y k = h k ( x k , u k − 1 , n k ) Information set I k = ( y 0 , y 1 , . . . , y k , u 0 , u 1 , . . . , u k − 1 ) I 0 = ( y 0 ) k ≥ 1 , , Find a policy which minimizes π = { µ 0 , . . . , µ h − 1 } u k = µ k ( I k ) h − 1 X J π = E [ g k ( x k , µ k ( I k ) , w k ) + g h ( x k )] k =0 We assume that the initial state, the stochastic disturbances and the output noise have zero mean, are statistically independent and have a known probability distribution. 11

  15. First approach We can reformulate a partial information optimal control problem as a standard full information optimal control problem by considering the state to be the information set. The problem is that the state space dimension increases exponentially. y 1 y 2 Measurement space y 1 y 0 y 0 y 0 Information set u 1 uncertainty u 0 I 0 = ( y 0 ) I 1 = ( y 0 , y 1 , u 0 ) I 2 = ( y 0 , y 1 , y 2 , u 0 , u 1 ) Stage 2 Stage 0 Stage 1 12

  16. Second approach It is possible to show that the knowledge of the probability distribution of the state given the information obtained so far denoted by is sufficient to determine optimal decisions P x k | I k u k = µ k ( P x k | I k ) Probability space P x 0 | I 0 P x 2 | I 2 P x 1 | I 1 uncertainty u 0 Stage 2 Stage 0 Stage 1 The space space dimension is now fixed. However, it is typically not easy to store . P x k | I k 13

  17. Problem formulation zero mean E [ ω k ] = 0 and independent Dynamic model x k +1 = Ax k + Bu k + ω k k ∈ { 0 , . . . , h − 1 }  Q �  x k � S P h − 1 Cost function k =0 [ x | k u | + x | k ] h Q h x h S | R u k noise: zero mean and Output equation y k = Cx k + n k independent Information set I k = ( y 0 , y 1 , . . . , y k , u 0 , u 1 , . . . , u k − 1 ) Find a policy which minimizes π = { µ 0 , . . . , µ h − 1 } u k = µ k ( I k )  Q �  x k � S P h − 1 k =0 E [[ x | k u | + x | k ] h Q h x h ] S | R u k 14

Recommend


More recommend