The separation principle in stochastic control, revisited Workshop in honor of Eduardo Sontag on the occasion of his 60th birthday Tryphon T. Georgiou joint work with Anders Lindquist
w y u linear stochastic system � π dx = A ( t ) x ( t ) dt + B 1 ( t ) u ( t ) dt + B 2 ( t ) dw dy = C ( t ) x ( t ) dt + D ( t ) dw w ( t ) is a vector-valued Wiener process x (0) is a Gaussian random vector independent of w ( t ) , y (0) = 0 A , B 1 , B 2 , C , D are matrix-valued functions Goal: Design nonanticipatory control π : y �→ u that minimizes �� T � � T x ( t ) ′ Q ( t ) x ( t ) dt + u ( t ) ′ R ( t ) u ( t ) dt + x ( T ) ′ Sx ( T ) J ( u ) = E 0 0
separation priniciple under suitable assumptions on the class of admissible control π : y �→ u , the “optimal control” is u ( t ) = K ( t )ˆ x ( t ) where ˆ x ( t ) = E { x ( t ) | Y t } , d ˆ x = A ( t )ˆ x ( t ) dt + B 1 ( t ) u ( t ) dt + L ( t )( dy − C ( t )ˆ x ( t ) dt ) x (0) = 0 . ˆ with K ( t ) and L ( t ) computed via a pair of dual Riccati equations NB: — attempts to prove separation for u ( t ) is Y t measurable (a.s.). . . — too big a class; we know no proof which is correct (strong solutions) 3
historical remarks Wonham, Kushner, Lindquist, Fleming & Rishel • treatment overburdened with technicalities • folk accounts not supported by existing proofs • non-Gaussian nature due to an a-priori nonlinear π is often overlooked • herein, separation principle for: w – the most natural class of controls y u all linear/nonlinear and even discontinuous such that feedback loop makes “engineering” sense – engineering view point: signals = sample functions π – general semimartingale driving noise, with jumps – delay-differential linear systems, etc. 4
the standard “completion of squares” � � � T � T x (0) ′ P (0) x (0) + ( u − Kx ) ′ R ( u − Kx ) dt tr( B ′ J ( u ) = E + 2 PB 2 ) dt 0 0 where � ˙ P = − A ′ P − PA + PB 1 R − 1 B ′ 1 P − Q P ( T ) = S K ( t ) := − R ( t ) − 1 B 1 ( t ) ′ P ( t ) . using Itˆ o’s rule: d ( x ′ Px ) = x ′ ˙ Pxdt + 2 x ′ Pdx + tr( B ′ 2 PB 2 ) dt = [ − x ′ Qx − u ′ Ru + ( u − Kx ) ′ R ( u − Kx ) + tr( B ′ 2 PB 2 )] dt + 2 x ′ PB 2 dv with “complete state-information”: u optimal ( t ) = K ( t ) x ( t ) 5
incomplete state information u ( t ) needs to be a function of { y ( s ); 0 ≤ s ≤ t } Standard recipe: u ( t ) = K ( t )ˆ x ( t ) where x ( t ) = E { x ( t ) | Y t } ˆ justification ⇔ separation theorem 6
where is the potential problem? set x ( t ) := x ( t ) − ˆ ˜ x ( t ) then � T � T ( u − Kx ) ′ R ( u − Kx ) dt = E x ) ′ R ( u − K ˆ x )] dt +tr( K ′ RK Σ) [( u − K ˆ E 0 0 x ( t ) ′ } = 0 , since E { [ u ( t ) − K ( t )ˆ x ( t )]˜ x ( t ) ′ } and where Σ( t ) := E { ˜ x ( t )˜ why isn’t obvious that u = K ˆ x is optimal? subtlety: in general, Σ may depend on the control 7
source of fallacy (?) due to linearity � t x ( t ) = x 0 ( t ) + Φ( t, s ) B 1 ( s ) u ( s ) ds 0 the control term cancels out: x ( t ) = ˜ ˜ x 0 ( t ) := x 0 ( t ) − ˆ x 0 ( t ) , where ˆ x 0 ( t ) := E { x 0 ( t ) | Y t } x 0 ( t ) ′ } depend on the control? how could E { ˜ x 0 ( t )˜ because the filtration Y t , and hence ˆ x 0 , might depend on u ! — u is in general a nonlinear function of y — hence, y may not be Gaussian — despite the fact that x 0 is Gaussian, x 0 ( t ) = E { x 0 ( t ) | Y t } may not be linear in the data { y ( τ ); τ ∈ [0 , t ] } ˆ — ˆ x 0 ( t ) may not be given by a Kalman filter. 8
generalization - notation z 0 + + u z y g H π � t z ( t ) = z 0 ( t ) + 0 G ( t, τ ) u ( τ ) dτ y ( t ) = Hz ( t ) where � t g : ( t, u ) �→ G ( t, τ ) u ( τ ) dτ 0 � � x ( t ) E.g., z ( t ) = and H = [0 , I ] y ( t ) 9
ways out (?) SOL: stochastic open loop (Lindquist) limit control so as to be adapted to {Y 0 t } z 0 + y 0 u z 0 y + z g H π H examples — linear control — Lipschitz feedback 10
e.g., control adapted to {Y 0 t } via z 0 + y 0 u y + z g g π H − g H 11
example: linear feedback � t u ( t ) = u deterministic + F ( t, τ ) dy 0 then the Gaussian character is preserved. It can be shown that Y t = Y 0 t . Hence, d ˜ x = ( A − LC )˜ xdt + ( B 2 − LD ) dw x (0) = x (0) ˜ x ( t ) ′ } is independent of u Σ( t ) := E { ˜ x ( t )˜ 12
� t � t u ( t ) = F ( t, τ ) dy ( τ ) ⇒ dy = dy 0 + M ( t, s ) u ( s ) dsdt 0 0 � t ⇒ dy = dy 0 + N ( t, τ ) dy ( τ ) dt 0 � t N ( t, τ ) = τ M ( t, s ) F ( s, τ ) ds where � t R ( t, τ ) = τ R ( t, s ) N ( s, τ ) ds + N ( t, s ) Volterra resolvent Then � t � t N ( t, τ ) dy ( τ ) = R ( t, τ ) dy 0 ( τ ) 0 0 � t ⇒ dy = dy 0 + R ( t, τ ) dy 0 ( τ ) dt 0 ⇒ σ { y ( τ ); 0 ≤ τ ≤ t } = σ { y 0 ( τ ); 0 ≤ τ ≤ t } 13
example: Lipschitz continuous control [Wonham] Assuming that dy ( t ) = x ( t ) dt + D ( t ) dw ( t ) i.e., C ( t ) = I is invertible! Then among control laws of the form u ( t ) = ψ ( t, ˆ x ( t )) the choice u ( t ) = K ( t )ˆ x ( t ) is optimal. [Fleming & Rishel] removed the assumption on C ( t ) ; Lipschitz on y ; simpler proof. 14
example: Lipschitz (cont.) ˆ ξ 0 ( t ) := E { x 0 ( t ) | Y 0 [Kushner] t } given by the Kalman filter d ˆ ξ 0 = A ˆ ξ 0 ( t ) dt + L ( t ) dv 0 , ˆ ξ 0 (0) = 0 dv 0 = dy 0 − C ˆ ξ 0 ( t ) dt, v 0 (0) = 0 define � t ξ ( t ) := ˆ ˆ ξ 0 ( t ) + Φ( t, s ) B 1 ( s ) u ( s ) ds 0 and assume u ( t ) = ψ ( t, ˆ ξ ( t )) is Lipschitz Then ˆ ξ is the unique strong solution of � � d ˆ A ˆ ξ ( t ) + B 1 ψ ( t, ˆ dt + L ( t ) dv 0 , ˆ ξ = ξ ( t )) ξ (0) = 0 . ˆ This choice force u to be adapted to {Y 0 t } ⇒ {Y 0 t } = {Y t } ⇒ ξ = ˆ x 15
example: delay in the loop when u ( t ) is a function of y ( τ ); 0 ≤ τ ≤ t − ε , Y t = Y 0 t the possibility of a control-dependent σ -field does not arise in the usual (predictive) discrete-time formulation — Taking ǫ → 0 and general nonlinear feedback there is no guarantee that Y t is left-continuous — “Proofs” of separation using such limits are circular, misleading accounts in textbooks. 16
signals and systems signals : sample paths; possibly having bounded discontinuities in D (c` adl` ag – Skorokhod space) systems : measurable nonanticipatory maps examples: i) SDE’s that have strong solutions ii) nonlinearities, hysteresis ( C → D ), etc. h ( z ) 1 ǫ z z ( t ) → h ( z ( t )) 17
well-posedness of feedback Defn. a feedback loop, that is z = z 0 + f ( z ) is well-posed z 0 (1 − f ) − 1 if it has a unique solution in D for all z 0 ∈ D + + and (1 − f ) − 1 is a system. z h low pass � �� � f h ( z ) 1 ǫ z z ( t ) = (1 − f ) − 1 z 0 ( t ) z 0 ( t ) 18
well-posedness (cont.) by defn z, z 0 stochastic processes well-posedness implies that z 0 + + Z 0 t = Z t , t ∈ [0 , T ] . z (1 − f ) and (1 − f ) − 1 are systems f ⇒ z 0 = z − f ( z ) and z = (1 − f ) − 1 z 0 NB. — no more information other than what is contained in Z 0 t 19
how about incomplete state-information? z y H � � � � 0 w z 1 = , z 2 = 0 w generate the same filtrations, i.e., Z 1 t = Z 2 t � 1 0 � while for H = , � 1 0 � � � � 1 0 � � � w 0 y 1 = y 2 = , 0 w do not, i.e., Y 1 t � = Y 2 t . 20
z 0 linear read-out map + + u z y g H π Assume z ( t ) = z 0 ( t ) + g ◦ π ( y ( t )) y ( t ) = Hz ( t ) is well-posed with H linear, it follows that Y t = Y 0 t ∈ [0 , T ] . t , 21
(1 − Hgπ ) H = H − HgπH Proof: = H (1 − gπH ) H (1 − gπH ) − 1 = (1 − Hgπ ) − 1 H ⇒ y = (1 − Hgπ ) − 1 y 0 , and y 0 = (1 − Hgπ ) y . 22
essence of the lemma well-posedness resolves the issue of circular control dependence z 0 z 0 H + + u z y g + H u y ≃ z g H + π π 23
the separation principle thm: assuming � dx = A ( t ) x ( t ) dt + B 1 ( t ) u ( t ) dt + B 2 ( t ) dw dy = C ( t ) x ( t ) dt + D ( t ) dw w ( t ) is a vector-valued Wiener process x (0) is a Gaussian random vector independent of w ( t ) , y (0) = 0 A , B 1 , B 2 , C , D are matrix-valued functions there is a unique control law π : y �→ u minimizing �� T � � T x ( t ) ′ Q ( t ) x ( t ) dt + u ( t ) ′ R ( t ) u ( t ) dt + x ( T ) ′ Sx ( T ) J ( u ) = E 0 0 in the class of well-posed control laws, and has the form u ( t ) = K ( t )ˆ x ( t ) 24
the separation principle (general) thm: for the same linear system, assuming w is a semimartingale and x (0) an independent random vector the unique optimal control in the class of well-posed controllers is given by u ( t ) = K ( t )ˆ x ( t ) where ˆ x is the conditional mean. remarks: no need for Lipschitz continuity allows jump processes K ( t ) is still given by a Riccati equation in general, the difficult part is constructing ˆ x ( t ) = E { x ( t ) |Y t } . 25
Recommend
More recommend