the separation principle in stochastic control revisited
play

The separation principle in stochastic control, revisited Workshop - PowerPoint PPT Presentation

The separation principle in stochastic control, revisited Workshop in honor of Eduardo Sontag on the occasion of his 60th birthday Tryphon T. Georgiou joint work with Anders Lindquist w y u linear stochastic system dx = A ( t ) x


  1. The separation principle in stochastic control, revisited Workshop in honor of Eduardo Sontag on the occasion of his 60th birthday Tryphon T. Georgiou joint work with Anders Lindquist

  2. w y u linear stochastic system � π dx = A ( t ) x ( t ) dt + B 1 ( t ) u ( t ) dt + B 2 ( t ) dw dy = C ( t ) x ( t ) dt + D ( t ) dw w ( t ) is a vector-valued Wiener process x (0) is a Gaussian random vector independent of w ( t ) , y (0) = 0 A , B 1 , B 2 , C , D are matrix-valued functions Goal: Design nonanticipatory control π : y �→ u that minimizes �� T � � T x ( t ) ′ Q ( t ) x ( t ) dt + u ( t ) ′ R ( t ) u ( t ) dt + x ( T ) ′ Sx ( T ) J ( u ) = E 0 0

  3. separation priniciple under suitable assumptions on the class of admissible control π : y �→ u , the “optimal control” is u ( t ) = K ( t )ˆ x ( t ) where ˆ x ( t ) = E { x ( t ) | Y t } , d ˆ x = A ( t )ˆ x ( t ) dt + B 1 ( t ) u ( t ) dt + L ( t )( dy − C ( t )ˆ x ( t ) dt ) x (0) = 0 . ˆ with K ( t ) and L ( t ) computed via a pair of dual Riccati equations NB: — attempts to prove separation for u ( t ) is Y t measurable (a.s.). . . — too big a class; we know no proof which is correct (strong solutions) 3

  4. historical remarks Wonham, Kushner, Lindquist, Fleming & Rishel • treatment overburdened with technicalities • folk accounts not supported by existing proofs • non-Gaussian nature due to an a-priori nonlinear π is often overlooked • herein, separation principle for: w – the most natural class of controls y u all linear/nonlinear and even discontinuous such that feedback loop makes “engineering” sense – engineering view point: signals = sample functions π – general semimartingale driving noise, with jumps – delay-differential linear systems, etc. 4

  5. the standard “completion of squares” � � � T � T x (0) ′ P (0) x (0) + ( u − Kx ) ′ R ( u − Kx ) dt tr( B ′ J ( u ) = E + 2 PB 2 ) dt 0 0 where � ˙ P = − A ′ P − PA + PB 1 R − 1 B ′ 1 P − Q P ( T ) = S K ( t ) := − R ( t ) − 1 B 1 ( t ) ′ P ( t ) . using Itˆ o’s rule: d ( x ′ Px ) = x ′ ˙ Pxdt + 2 x ′ Pdx + tr( B ′ 2 PB 2 ) dt = [ − x ′ Qx − u ′ Ru + ( u − Kx ) ′ R ( u − Kx ) + tr( B ′ 2 PB 2 )] dt + 2 x ′ PB 2 dv with “complete state-information”: u optimal ( t ) = K ( t ) x ( t ) 5

  6. incomplete state information u ( t ) needs to be a function of { y ( s ); 0 ≤ s ≤ t } Standard recipe: u ( t ) = K ( t )ˆ x ( t ) where x ( t ) = E { x ( t ) | Y t } ˆ justification ⇔ separation theorem 6

  7. where is the potential problem? set x ( t ) := x ( t ) − ˆ ˜ x ( t ) then � T � T ( u − Kx ) ′ R ( u − Kx ) dt = E x ) ′ R ( u − K ˆ x )] dt +tr( K ′ RK Σ) [( u − K ˆ E 0 0 x ( t ) ′ } = 0 , since E { [ u ( t ) − K ( t )ˆ x ( t )]˜ x ( t ) ′ } and where Σ( t ) := E { ˜ x ( t )˜ why isn’t obvious that u = K ˆ x is optimal? subtlety: in general, Σ may depend on the control 7

  8. source of fallacy (?) due to linearity � t x ( t ) = x 0 ( t ) + Φ( t, s ) B 1 ( s ) u ( s ) ds 0 the control term cancels out: x ( t ) = ˜ ˜ x 0 ( t ) := x 0 ( t ) − ˆ x 0 ( t ) , where ˆ x 0 ( t ) := E { x 0 ( t ) | Y t } x 0 ( t ) ′ } depend on the control? how could E { ˜ x 0 ( t )˜ because the filtration Y t , and hence ˆ x 0 , might depend on u ! — u is in general a nonlinear function of y — hence, y may not be Gaussian — despite the fact that x 0 is Gaussian, x 0 ( t ) = E { x 0 ( t ) | Y t } may not be linear in the data { y ( τ ); τ ∈ [0 , t ] } ˆ — ˆ x 0 ( t ) may not be given by a Kalman filter. 8

  9. generalization - notation z 0 + + u z y g H π � t z ( t ) = z 0 ( t ) + 0 G ( t, τ ) u ( τ ) dτ y ( t ) = Hz ( t ) where � t g : ( t, u ) �→ G ( t, τ ) u ( τ ) dτ 0 � � x ( t ) E.g., z ( t ) = and H = [0 , I ] y ( t ) 9

  10. ways out (?) SOL: stochastic open loop (Lindquist) limit control so as to be adapted to {Y 0 t } z 0 + y 0 u z 0 y + z g H π H examples — linear control — Lipschitz feedback 10

  11. e.g., control adapted to {Y 0 t } via z 0 + y 0 u y + z g g π H − g H 11

  12. example: linear feedback � t u ( t ) = u deterministic + F ( t, τ ) dy 0 then the Gaussian character is preserved. It can be shown that Y t = Y 0 t . Hence, d ˜ x = ( A − LC )˜ xdt + ( B 2 − LD ) dw x (0) = x (0) ˜ x ( t ) ′ } is independent of u Σ( t ) := E { ˜ x ( t )˜ 12

  13. � t � t u ( t ) = F ( t, τ ) dy ( τ ) ⇒ dy = dy 0 + M ( t, s ) u ( s ) dsdt 0 0 � t ⇒ dy = dy 0 + N ( t, τ ) dy ( τ ) dt 0 � t N ( t, τ ) = τ M ( t, s ) F ( s, τ ) ds where � t R ( t, τ ) = τ R ( t, s ) N ( s, τ ) ds + N ( t, s ) Volterra resolvent Then � t � t N ( t, τ ) dy ( τ ) = R ( t, τ ) dy 0 ( τ ) 0 0 � t ⇒ dy = dy 0 + R ( t, τ ) dy 0 ( τ ) dt 0 ⇒ σ { y ( τ ); 0 ≤ τ ≤ t } = σ { y 0 ( τ ); 0 ≤ τ ≤ t } 13

  14. example: Lipschitz continuous control [Wonham] Assuming that dy ( t ) = x ( t ) dt + D ( t ) dw ( t ) i.e., C ( t ) = I is invertible! Then among control laws of the form u ( t ) = ψ ( t, ˆ x ( t )) the choice u ( t ) = K ( t )ˆ x ( t ) is optimal. [Fleming & Rishel] removed the assumption on C ( t ) ; Lipschitz on y ; simpler proof. 14

  15. example: Lipschitz (cont.) ˆ ξ 0 ( t ) := E { x 0 ( t ) | Y 0 [Kushner] t } given by the Kalman filter d ˆ ξ 0 = A ˆ ξ 0 ( t ) dt + L ( t ) dv 0 , ˆ ξ 0 (0) = 0 dv 0 = dy 0 − C ˆ ξ 0 ( t ) dt, v 0 (0) = 0 define � t ξ ( t ) := ˆ ˆ ξ 0 ( t ) + Φ( t, s ) B 1 ( s ) u ( s ) ds 0 and assume u ( t ) = ψ ( t, ˆ ξ ( t )) is Lipschitz Then ˆ ξ is the unique strong solution of � � d ˆ A ˆ ξ ( t ) + B 1 ψ ( t, ˆ dt + L ( t ) dv 0 , ˆ ξ = ξ ( t )) ξ (0) = 0 . ˆ This choice force u to be adapted to {Y 0 t } ⇒ {Y 0 t } = {Y t } ⇒ ξ = ˆ x 15

  16. example: delay in the loop when u ( t ) is a function of y ( τ ); 0 ≤ τ ≤ t − ε , Y t = Y 0 t the possibility of a control-dependent σ -field does not arise in the usual (predictive) discrete-time formulation — Taking ǫ → 0 and general nonlinear feedback there is no guarantee that Y t is left-continuous — “Proofs” of separation using such limits are circular, misleading accounts in textbooks. 16

  17. signals and systems signals : sample paths; possibly having bounded discontinuities in D (c` adl` ag – Skorokhod space) systems : measurable nonanticipatory maps examples: i) SDE’s that have strong solutions ii) nonlinearities, hysteresis ( C → D ), etc. h ( z ) 1 ǫ z z ( t ) → h ( z ( t )) 17

  18. well-posedness of feedback Defn. a feedback loop, that is z = z 0 + f ( z ) is well-posed z 0 (1 − f ) − 1 if it has a unique solution in D for all z 0 ∈ D + + and (1 − f ) − 1 is a system. z h low pass � �� � f h ( z ) 1 ǫ z z ( t ) = (1 − f ) − 1 z 0 ( t ) z 0 ( t ) 18

  19. well-posedness (cont.) by defn z, z 0 stochastic processes well-posedness implies that z 0 + + Z 0 t = Z t , t ∈ [0 , T ] . z (1 − f ) and (1 − f ) − 1 are systems f ⇒ z 0 = z − f ( z ) and z = (1 − f ) − 1 z 0 NB. — no more information other than what is contained in Z 0 t 19

  20. how about incomplete state-information? z y H � � � � 0 w z 1 = , z 2 = 0 w generate the same filtrations, i.e., Z 1 t = Z 2 t � 1 0 � while for H = , � 1 0 � � � � 1 0 � � � w 0 y 1 = y 2 = , 0 w do not, i.e., Y 1 t � = Y 2 t . 20

  21. z 0 linear read-out map + + u z y g H π Assume z ( t ) = z 0 ( t ) + g ◦ π ( y ( t )) y ( t ) = Hz ( t ) is well-posed with H linear, it follows that Y t = Y 0 t ∈ [0 , T ] . t , 21

  22. (1 − Hgπ ) H = H − HgπH Proof: = H (1 − gπH ) H (1 − gπH ) − 1 = (1 − Hgπ ) − 1 H ⇒ y = (1 − Hgπ ) − 1 y 0 , and y 0 = (1 − Hgπ ) y . 22

  23. essence of the lemma well-posedness resolves the issue of circular control dependence z 0 z 0 H + + u z y g + H u y ≃ z g H + π π 23

  24. the separation principle thm: assuming � dx = A ( t ) x ( t ) dt + B 1 ( t ) u ( t ) dt + B 2 ( t ) dw dy = C ( t ) x ( t ) dt + D ( t ) dw w ( t ) is a vector-valued Wiener process x (0) is a Gaussian random vector independent of w ( t ) , y (0) = 0 A , B 1 , B 2 , C , D are matrix-valued functions there is a unique control law π : y �→ u minimizing �� T � � T x ( t ) ′ Q ( t ) x ( t ) dt + u ( t ) ′ R ( t ) u ( t ) dt + x ( T ) ′ Sx ( T ) J ( u ) = E 0 0 in the class of well-posed control laws, and has the form u ( t ) = K ( t )ˆ x ( t ) 24

  25. the separation principle (general) thm: for the same linear system, assuming w is a semimartingale and x (0) an independent random vector the unique optimal control in the class of well-posed controllers is given by u ( t ) = K ( t )ˆ x ( t ) where ˆ x is the conditional mean. remarks: no need for Lipschitz continuity allows jump processes K ( t ) is still given by a Riccati equation in general, the difficult part is constructing ˆ x ( t ) = E { x ( t ) |Y t } . 25

Recommend


More recommend