lq optimal control for partially specified input noise
play

LQ optimal control for partially specified input noise Alexander - PowerPoint PPT Presentation

LQ optimal control for partially specified input noise Alexander Erreygers Jasper De Bock Gert de Cooman Arthur Van Camp Ghent University 28th European Conference on Operational Research 1 / 13 Scalar linear systems The controller is


  1. LQ optimal control for partially specified input noise Alexander Erreygers Jasper De Bock Gert de Cooman Arthur Van Camp Ghent University 28th European Conference on Operational Research 1 / 13

  2. Scalar linear systems The controller is interested in the system X k +1 = aX k + bu k + W k , (1) for k ∈ N = { 0 , 1 , . . . , n } , where n ∈ N , a ∈ R and b ∈ R \ { 0 } , where X k +1 is the real-valued state , u k is the real-valued control input , W k is the real-valued stochastic noise . In general, system parameters a and b can be time dependent. 2 / 13

  3. Scalar linear systems The controller is interested in the system X k +1 = aX k + bu k + W k . (1) Observation assumptions 1 Before applying u k , the controller observes the actual value x k of X k (hence X 0 ≡ x 0 ). 2 The controller has perfect recall. 2 / 13

  4. Scalar linear systems The controller is interested in the system X k +1 = aX k + bφ k ( X k ) + W k . (1) Observation assumptions 1 Before applying u k , the controller observes the actual value x k of X k (hence X 0 ≡ x 0 ). 2 The controller has perfect recall. Controller determines u k from state history x k := ( x 0 , . . . , x k ) : u k = φ k ( x k ) . φ k : R k +1 → R is a feedback function, φ := ( φ 0 , . . . , φ n ) is a control policy , Φ denotes the set of all control policies. 2 / 13

  5. Scalar linear systems The controller is interested in the system X k +1 = aX k + bφ k ( X k ) + W k . (1) Observation assumptions 1 Before applying u k , the controller observes the actual value x k of X k (hence X 0 ≡ x 0 ). 2 The controller has perfect recall. Controller knows x k and φ → can calculate w k − 1 . 2 / 13

  6. Optimality of a control policy For any control policy φ ∈ Φ , any k ∈ N and any state history x k ∈ R k +1 we define the quadratic cost functional as n rφ ℓ ( x k , X k +1: ℓ ) 2 + qX 2 � J [ φ | x k ] := ℓ +1 , ℓ = k where q ≥ 0 and r > 0 are real-valued coefficients. 3 / 13

  7. Precise noise model Definition (Precise noise model or PNM) The controller’s beliefs about the noise W 0 , . . . , W n are modelled using a linear expectation operator E . 4 / 13

  8. Optimality of a control policy For any control policy φ ∈ Φ , any k ∈ N and any state history x k ∈ R k +1 we define the quadratic cost functional as n rφ ℓ ( x k , X k +1: ℓ ) 2 + qX 2 � J [ φ | x k ] := ℓ +1 , ℓ = k where q ≥ 0 and r > 0 are real-valued coefficients. Definition (Optimality) A control policy ˆ φ is optimal if for all x 0 ˆ φ ∈ arg min E( J [ φ | x 0 ]) . φ ∈ Φ 5 / 13

  9. Optimality of a control policy Assume that at time k the controller knows the state history x k and noise history w k − 1 . We should only compare control policies φ ∈ Φ that could have resulted in x k and w k − 1 , i.e. such that x k , w k − 1 and φ are a solution of the system dynamics. φ ∈ Φ: φ, x k and w k − 1 are Φ( x k , w k − 1 ) := � � a solution of the system dynamics. 6 / 13

  10. Optimality of a control policy Assume that at time k the controller knows the state history x k and noise history w k − 1 . We should only compare control policies φ ∈ Φ that could have resulted in x k and w k − 1 , i.e. such that x k , w k − 1 and φ are a solution of the system dynamics. φ ∈ Φ: φ, x k and w k − 1 are Φ( x k , w k − 1 ) := � � a solution of the system dynamics. Definition (Optimality) A control policy ˆ φ is optimal for the state history x k and the noise history w k − 1 if ˆ E( J [ φ | x k ] | w k − 1 ) . φ ∈ arg min φ ∈ Φ( x k ,w k − 1 ) 6 / 13

  11. The principle of optimality Principle of optimality A control policy that is “optimal” for the “current state” should also be optimal for the “remaining states” it can end up in. 7 / 13

  12. The principle of optimality Principle of optimality A control policy that is “optimal” for the “current state” should also be optimal for the “remaining states” it can end up in. Assume that ˆ φ is optimal for all x 0 ∈ R . The controller 1 observes x 0 , 2 applies u 0 = φ 0 ( x 0 ) , 3 observes x 1 and computes w 0 . Is ˆ φ optimal for ( x 0 , x 1 ) and w 0 ? 7 / 13

  13. The principle of optimality Principle of optimality A control policy that is “optimal” for the “current state” should also be optimal for the “remaining states” it can end up in. Assume that ˆ φ is optimal for all x 0 ∈ R . The controller 1 observes x 0 , 2 applies u 0 = φ 0 ( x 0 ) , 3 observes x 1 and computes w 0 . Is ˆ φ optimal for ( x 0 , x 1 ) and w 0 ? Not necessarily! Definition (Complete optimality) If for all k ∈ N the control policy φ ∈ Φ is optimal for all x k and w k − 1 such that x k , w k − 1 and φ are compatible, then it is completely optimal . 7 / 13

  14. Unique optimal control policy Theorem The unique completely optimal control policy ˆ φ is given by � � ˆ φ k ( x k ) := − ˜ m k +1 ax k + h k | w k − 1 r k b . ˜ r k and m k +1 are derived from backwards recursive relations. Feedforward h k | w k − 1 is derived from h n +1 | w n := 0 and r k +1 r E( h k +1 | w k − 1 ,W k | w k − 1 ) + m k +1 E( W k | w k − 1 ) . h k | w k − 1 := a ˜ 8 / 13

  15. Unique optimal control policy Theorem The unique completely optimal control policy ˆ φ is given by � � ˆ φ k ( x k ) := − ˜ m k +1 ax k + h k | w k − 1 r k b . ˜ r k and m k +1 are derived from backwards recursive relations. Feedforward h k | w k − 1 is derived from h n +1 | w n := 0 and r k +1 r E( h k +1 | w k − 1 ,W k | w k − 1 ) + m k +1 E( W k | w k − 1 ) . h k | w k − 1 := a ˜ − Precise specification of noise model is necessary. − Calculating the feedforward is intractable. − Backwards recursive calculations + Almost immediately generalisable to time-dependent a k , b k , r k and q k +1 and/or multi-dimensional systems. 8 / 13

  16. Unique optimal control policy Disadvantages − Calculating the feedforward is intractable. Feedforward h k | w k − 1 is derived from h n +1 | w n := 0 and r k +1 r E( h k +1 | w k − 1 ,W k | w k − 1 )+ m k +1 E( W k | w k − 1 ) . h k | w k − 1 := a ˜ 9 / 13

  17. Unique optimal control policy Disadvantages − Calculating the feedforward is intractable. S White noise model: W 0 , . . . , W n are mutually independent. Feedforward h k is derived from h n +1 := 0 and h k := a ˜ r k +1 rh k +1 + m k +1 E( W k ) . 9 / 13

  18. Unique optimal control policy Disadvantages − Calculating the feedforward is intractable. S White noise model: W 0 , . . . , W n are mutually independent. Feedforward h k is derived from h n +1 := 0 and h k := a ˜ r k +1 rh k +1 + m k +1 E( W k ) . − Backwards recursive calculations S White noise model & stationarity simplify these calculations. If E( W k ) ≡ E( W ) for all k ∈ N , then m k +1 − n →∞ m, − − → r k − ˜ n →∞ ˜ − − → r, h k − n →∞ h. − − → 9 / 13

  19. Partially specified noise model − Precise specification of noise model is necessary. 10 / 13

  20. Partially specified noise model − Precise specification of noise model is necessary. Definition (Partially specified noise model or PSNM) The partially specified noise model E is the largest subset of the set of all precise noise models such that for all E ∈ E , all k ∈ N and all w k − 1 E( W k ) ≤ E( W k | w k − 1 ) ≤ E( W k ) . Note : E does not assume independence! 10 / 13

  21. Partially specified noise model − Precise specification of noise model is necessary. Definition (Partially specified noise model or PSNM) The partially specified noise model E is the largest subset of the set of all precise noise models such that for all E ∈ E , all k ∈ N and all w k − 1 E( W k ) ≤ E( W k | w k − 1 ) ≤ E( W k ) . Note : E does not assume independence! Definition (E-admissibility) A control policy is E-admissible if it is completely optimal for at least one precise noise model in the partially specified noise model. 10 / 13

  22. E-admissible control policies From the definition of E-admissibility, it follows immediately that any E-admissible control policy has the form � � φ k ( x k ) = − ˜ m k +1 ax k + h k | w k − 1 r k b . 11 / 13

  23. E-admissible control policies Theorem For any E-admissible control policy, the feedfworward term h k | w k − 1 is bounded: for all k ∈ N and for all noise histories w k − 1 , h k ≤ h k | w k − 1 ≤ h k . Moreover, any h k | w k − 1 ∈ [ h k , h k ] is reached by some E ∈ E . Strict bounds h k and h k are derived from [ h n +1 , h n +1 ] := 0 and [ h k , h k ] := a ˜ r k +1 r [ h k +1 , h k +1 ] + m k +1 [E( W k ) , E( W k )] . 11 / 13

  24. E-admissible control policies Theorem For any E-admissible control policy, the feedfworward term h k | w k − 1 is bounded: for all k ∈ N and for all noise histories w k − 1 , h k ≤ h k | w k − 1 ≤ h k . Moreover, any h k | w k − 1 ∈ [ h k , h k ] is reached by some E ∈ E . + Imprecise specification ? Which control policy to apply? + Computation of h k and − Backwards recursive h k is tractable. calculations ? Generalisation to + Easily generalised to a k , b k , r k and q k +1 . multi-dimensional systems is not immediate. 11 / 13

Recommend


More recommend