LQ optimal control for partially specified input noise Alexander Erreygers Jasper De Bock Gert de Cooman Arthur Van Camp Ghent University 28th European Conference on Operational Research 1 / 13
Scalar linear systems The controller is interested in the system X k +1 = aX k + bu k + W k , (1) for k ∈ N = { 0 , 1 , . . . , n } , where n ∈ N , a ∈ R and b ∈ R \ { 0 } , where X k +1 is the real-valued state , u k is the real-valued control input , W k is the real-valued stochastic noise . In general, system parameters a and b can be time dependent. 2 / 13
Scalar linear systems The controller is interested in the system X k +1 = aX k + bu k + W k . (1) Observation assumptions 1 Before applying u k , the controller observes the actual value x k of X k (hence X 0 ≡ x 0 ). 2 The controller has perfect recall. 2 / 13
Scalar linear systems The controller is interested in the system X k +1 = aX k + bφ k ( X k ) + W k . (1) Observation assumptions 1 Before applying u k , the controller observes the actual value x k of X k (hence X 0 ≡ x 0 ). 2 The controller has perfect recall. Controller determines u k from state history x k := ( x 0 , . . . , x k ) : u k = φ k ( x k ) . φ k : R k +1 → R is a feedback function, φ := ( φ 0 , . . . , φ n ) is a control policy , Φ denotes the set of all control policies. 2 / 13
Scalar linear systems The controller is interested in the system X k +1 = aX k + bφ k ( X k ) + W k . (1) Observation assumptions 1 Before applying u k , the controller observes the actual value x k of X k (hence X 0 ≡ x 0 ). 2 The controller has perfect recall. Controller knows x k and φ → can calculate w k − 1 . 2 / 13
Optimality of a control policy For any control policy φ ∈ Φ , any k ∈ N and any state history x k ∈ R k +1 we define the quadratic cost functional as n rφ ℓ ( x k , X k +1: ℓ ) 2 + qX 2 � J [ φ | x k ] := ℓ +1 , ℓ = k where q ≥ 0 and r > 0 are real-valued coefficients. 3 / 13
Precise noise model Definition (Precise noise model or PNM) The controller’s beliefs about the noise W 0 , . . . , W n are modelled using a linear expectation operator E . 4 / 13
Optimality of a control policy For any control policy φ ∈ Φ , any k ∈ N and any state history x k ∈ R k +1 we define the quadratic cost functional as n rφ ℓ ( x k , X k +1: ℓ ) 2 + qX 2 � J [ φ | x k ] := ℓ +1 , ℓ = k where q ≥ 0 and r > 0 are real-valued coefficients. Definition (Optimality) A control policy ˆ φ is optimal if for all x 0 ˆ φ ∈ arg min E( J [ φ | x 0 ]) . φ ∈ Φ 5 / 13
Optimality of a control policy Assume that at time k the controller knows the state history x k and noise history w k − 1 . We should only compare control policies φ ∈ Φ that could have resulted in x k and w k − 1 , i.e. such that x k , w k − 1 and φ are a solution of the system dynamics. φ ∈ Φ: φ, x k and w k − 1 are Φ( x k , w k − 1 ) := � � a solution of the system dynamics. 6 / 13
Optimality of a control policy Assume that at time k the controller knows the state history x k and noise history w k − 1 . We should only compare control policies φ ∈ Φ that could have resulted in x k and w k − 1 , i.e. such that x k , w k − 1 and φ are a solution of the system dynamics. φ ∈ Φ: φ, x k and w k − 1 are Φ( x k , w k − 1 ) := � � a solution of the system dynamics. Definition (Optimality) A control policy ˆ φ is optimal for the state history x k and the noise history w k − 1 if ˆ E( J [ φ | x k ] | w k − 1 ) . φ ∈ arg min φ ∈ Φ( x k ,w k − 1 ) 6 / 13
The principle of optimality Principle of optimality A control policy that is “optimal” for the “current state” should also be optimal for the “remaining states” it can end up in. 7 / 13
The principle of optimality Principle of optimality A control policy that is “optimal” for the “current state” should also be optimal for the “remaining states” it can end up in. Assume that ˆ φ is optimal for all x 0 ∈ R . The controller 1 observes x 0 , 2 applies u 0 = φ 0 ( x 0 ) , 3 observes x 1 and computes w 0 . Is ˆ φ optimal for ( x 0 , x 1 ) and w 0 ? 7 / 13
The principle of optimality Principle of optimality A control policy that is “optimal” for the “current state” should also be optimal for the “remaining states” it can end up in. Assume that ˆ φ is optimal for all x 0 ∈ R . The controller 1 observes x 0 , 2 applies u 0 = φ 0 ( x 0 ) , 3 observes x 1 and computes w 0 . Is ˆ φ optimal for ( x 0 , x 1 ) and w 0 ? Not necessarily! Definition (Complete optimality) If for all k ∈ N the control policy φ ∈ Φ is optimal for all x k and w k − 1 such that x k , w k − 1 and φ are compatible, then it is completely optimal . 7 / 13
Unique optimal control policy Theorem The unique completely optimal control policy ˆ φ is given by � � ˆ φ k ( x k ) := − ˜ m k +1 ax k + h k | w k − 1 r k b . ˜ r k and m k +1 are derived from backwards recursive relations. Feedforward h k | w k − 1 is derived from h n +1 | w n := 0 and r k +1 r E( h k +1 | w k − 1 ,W k | w k − 1 ) + m k +1 E( W k | w k − 1 ) . h k | w k − 1 := a ˜ 8 / 13
Unique optimal control policy Theorem The unique completely optimal control policy ˆ φ is given by � � ˆ φ k ( x k ) := − ˜ m k +1 ax k + h k | w k − 1 r k b . ˜ r k and m k +1 are derived from backwards recursive relations. Feedforward h k | w k − 1 is derived from h n +1 | w n := 0 and r k +1 r E( h k +1 | w k − 1 ,W k | w k − 1 ) + m k +1 E( W k | w k − 1 ) . h k | w k − 1 := a ˜ − Precise specification of noise model is necessary. − Calculating the feedforward is intractable. − Backwards recursive calculations + Almost immediately generalisable to time-dependent a k , b k , r k and q k +1 and/or multi-dimensional systems. 8 / 13
Unique optimal control policy Disadvantages − Calculating the feedforward is intractable. Feedforward h k | w k − 1 is derived from h n +1 | w n := 0 and r k +1 r E( h k +1 | w k − 1 ,W k | w k − 1 )+ m k +1 E( W k | w k − 1 ) . h k | w k − 1 := a ˜ 9 / 13
Unique optimal control policy Disadvantages − Calculating the feedforward is intractable. S White noise model: W 0 , . . . , W n are mutually independent. Feedforward h k is derived from h n +1 := 0 and h k := a ˜ r k +1 rh k +1 + m k +1 E( W k ) . 9 / 13
Unique optimal control policy Disadvantages − Calculating the feedforward is intractable. S White noise model: W 0 , . . . , W n are mutually independent. Feedforward h k is derived from h n +1 := 0 and h k := a ˜ r k +1 rh k +1 + m k +1 E( W k ) . − Backwards recursive calculations S White noise model & stationarity simplify these calculations. If E( W k ) ≡ E( W ) for all k ∈ N , then m k +1 − n →∞ m, − − → r k − ˜ n →∞ ˜ − − → r, h k − n →∞ h. − − → 9 / 13
Partially specified noise model − Precise specification of noise model is necessary. 10 / 13
Partially specified noise model − Precise specification of noise model is necessary. Definition (Partially specified noise model or PSNM) The partially specified noise model E is the largest subset of the set of all precise noise models such that for all E ∈ E , all k ∈ N and all w k − 1 E( W k ) ≤ E( W k | w k − 1 ) ≤ E( W k ) . Note : E does not assume independence! 10 / 13
Partially specified noise model − Precise specification of noise model is necessary. Definition (Partially specified noise model or PSNM) The partially specified noise model E is the largest subset of the set of all precise noise models such that for all E ∈ E , all k ∈ N and all w k − 1 E( W k ) ≤ E( W k | w k − 1 ) ≤ E( W k ) . Note : E does not assume independence! Definition (E-admissibility) A control policy is E-admissible if it is completely optimal for at least one precise noise model in the partially specified noise model. 10 / 13
E-admissible control policies From the definition of E-admissibility, it follows immediately that any E-admissible control policy has the form � � φ k ( x k ) = − ˜ m k +1 ax k + h k | w k − 1 r k b . 11 / 13
E-admissible control policies Theorem For any E-admissible control policy, the feedfworward term h k | w k − 1 is bounded: for all k ∈ N and for all noise histories w k − 1 , h k ≤ h k | w k − 1 ≤ h k . Moreover, any h k | w k − 1 ∈ [ h k , h k ] is reached by some E ∈ E . Strict bounds h k and h k are derived from [ h n +1 , h n +1 ] := 0 and [ h k , h k ] := a ˜ r k +1 r [ h k +1 , h k +1 ] + m k +1 [E( W k ) , E( W k )] . 11 / 13
E-admissible control policies Theorem For any E-admissible control policy, the feedfworward term h k | w k − 1 is bounded: for all k ∈ N and for all noise histories w k − 1 , h k ≤ h k | w k − 1 ≤ h k . Moreover, any h k | w k − 1 ∈ [ h k , h k ] is reached by some E ∈ E . + Imprecise specification ? Which control policy to apply? + Computation of h k and − Backwards recursive h k is tractable. calculations ? Generalisation to + Easily generalised to a k , b k , r k and q k +1 . multi-dimensional systems is not immediate. 11 / 13
Recommend
More recommend