On the relation between certain stochastic control problems and probabilistic inference Manfred Opper Computer Science 1
• Control problem − → inference problem (Bert Kappen & Ema- ���� Bellman nuel Todorov) • Inference problem − → control problems ���� Var method • Things that could be learnt from this and possible extensions 2
Discrete times: MDPS • Assume Markov process with transition probabilities q ( x ′ | x, u ) tuned by a ’control’ variable u . • Try to minimise total expected costs T � E u [ L t ( X t , u t ) | x 0 = x ] V 0 ( x ) = t =0 • Define ’Value’ of state x � E u [ L τ ( X τ , u τ ) | X t = x ] V t ( x ) = τ ≥ t • Solution via Bellman equation � q t ( x ′ | x, u ) V t +∆ t ( x ′ ) V t ( x ) = min L t ( x, u ) + u x ′ 3
Continuous time: SDEs • (Ito) stochastic differential equation for state X ( t ) ∈ R d dt + D 1 / 2 ( X ( t )) dX ( t ) = ( u ( X t , t ) + f ( X ( t )) dW ( t ) � �� � � �� � Drift Diffusion W ( t ) vector of independent Wiener processes . • Limit of discrete time process X k √ ∆ X k ≡ X k +1 − X k = ( u t + f ( X k ))∆ t + D 1 / 2 ( X k ) ∆ t ǫ k . ǫ k i.i.d. Gaussian. 4
Continuous time ctd • Try to minimise total expected costs � T t =0 E u [ L t ( X ( t ) , u ( t )) | X (0) = x ] V 0 ( x ) = • Define ’Value’ of state x � T E u [ L s ( X ( s ) , u ( s )) | X ( t ) = x ] ds V t ( x ) = t • Solution via Hamilton - Jacobi - Bellman equation � � − ∂V t ( x ) L t ( x, u ) + ( u + f ) ⊤ ∇ V t ( x ) + 1 2Tr( D ∇ ⊤ ∇ ) V t ( x ) = min u ∂t 5
• Specialise to L t ( x, u ) = 1 2 u ( t ) ⊤ Ru ( t ) + U ( x ( t ) , t ) • Minimisation leads to u t = − R − 1 ∇ V t • and a a nonlinear PDE! − ∂V t ( x ) = − 1 2( ∇ V t ) ⊤ R − 1 ( ∇ V t ) + f ⊤ ( x ) ∇ V t ∂t +1 2Tr( D ∇ ⊤ ∇ ) V t + U ( x, t ) 6
Exact Linearisation (Kappen, 2005) • Assume that D = R − 1 and using the transformation V t ( x ) = − ln Z t ( x ) we get the equation � ∂ � ∂t + L † Z t ( x ) = 0 with the linear operator L † = f ⊤ ∇ + 1 2Tr( D ∇ ⊤ ∇ ) − U ( x, t ) • and a path integral representation e − � T � � Z t ( x ) = E u =0 t U τ ( X ( τ ) ,τ ) dτ | X ( t ) = x Now all kinds of inference tricks apply! 7
Todorov’s solvable MDPs (2006) q ( x ′ | x, u ) ln q ( x ′ | x, u ) � L t ( x, u ) = + U t ( x ) p ( x ′ | x ) x ′ Bellman equation � � q ( x ′ | x, u ) � q ( x ′ | x, u ) ln + V t +∆ t ( x ′ ) V t ( x ) = min U t ( x ) + p ( x ′ | x ) u x ′ The controlled transition probabilities: q ( x ′ | x, u ) ∝ p ( x ′ | x ) e − V t +∆ t ( x ′ ) with V t ( x ) = − ln Z t ( x ) we get the linear equation (Todorov, 2005) Z t ( x ) = e − U t ( x ) � p ( x ′ | x ) Z t +∆ t ( x ′ ) x ′ 8
Relation to continuous case • Short time transition probability � � − 1 � � x ′ , t + ∆ t | x, t 2∆ t � ∆ x − f ( x )∆ t � 2 p ∝ exp D as ∆ t → 0, with � F � 2 D = F ⊤ D − 1 F . • Let p g and p f short term transition probabilities for Diffusion pro- cesses with drift g and f with same diffusion D . Then � x ′ , t + ∆ t | x, t � � ln p g � � p f ( x ′ , t + ∆ t | x, t ) dx ′ = x ′ , t + ∆ t | x, t p g ≃ 1 2 � g ( x, t ) − f ( x, t ) � 2 D ∆ t 9
The KL divergence for Markov processes Consider probabilities p ( X 0: T ), q ( X 0: T ) over entire paths X 0: T . The total KL divergence .... � dx 0: T q ( x 0: T ) ln q ( x 0: T ) KL [ q ( X 0: T ) � p ( X 0: T )] = p ( x 0: T ) T − 1 � � dx k +1 q ( x k +1 | x k ) ln q ( x k +1 | x k ) � = dx k q ( x k ) p ( x k +1 | x k ) k =1 T − 1 � � = dx k q ( x k ) KL [ q ( ·| X k ) � p ( ·| X k )] k =1 .... is the expected sum of KLs for transition probabilities. 10
The global solution The Kappen / Todorov control problems are of the form: Minimise the Variational free energy � V t ( x ) = KL [ q ( X t : T ) � p ( X t : T )] + E q [ U τ ( X τ , τ )] τ ≥ t for fixed X t = x with respect to q . The optimal controlled probability over paths is 1 Z t ( x ) p ( X t : T ) e − � τ ≥ t U τ ( X τ ,τ ) q ∗ ( X t : T ) = with the minimal cost ( free energy ) � � e − � τ ≥ t U τ ( X τ ,τ ) | X t = x V t ( x ) = − ln Z t ( x ) = − ln E p This looks like a HMM with ’ likelihood ’ e − � τ ≥ t U τ ( X τ ,τ ) . 11
Comments • For HMMs � � e − � τ ≥ t U τ ( X τ ,τ ) | X t = x Z t ( x ) = E p ∝ P (future data | X t = x ) fulfils a linear backward equation . • The posterior = controlled process has transition probabilities p ( x t +1 | x t ) = Z t +1 ( x t +1 ) q ( x t +1 | x t ) e − U t ( x t ,t ) Z t ( x t ) • Similar things happen for the continuous case: � ∂ � ∂tf ⊤ ∇ + 1 2Tr( D ∇ ⊤ ∇ ) − U ( x, t ) Z t ( x ) = 0 • Posterior is a diffusion with ’controlled’ drift u t ( x ) = D ∇ ln Z t ( x ). 12
A ’real’ likelihood for continuous time paths Consider an inhomogeneous Poisson process with rate function U ( X t ). Then Pr { No event ∈ [0 T ] } = e − � T t U s ( X s ,s ) ds 13
Application: Simulate diffusions with constraints Wiener process with fixed endpoints x ( t = T ) = 0 14
Solution u t ( x ) = ∂ ln Z t ( x ) ∂x ∂ 2 Z t ( x ) ∂Z t ( x ) + 1 = 0 ∂x 2 2 ∂t Z t ( x ) = δ ( x ) is solved by x 2 Z t ( x ) ∝ e − 2( T − t ) and leads to x u t ( x ) = − T − t for 0 < t < T . 15
Diffusions with constraints on domain: X ( t ) ∈ Ω. 1. Method I: Kill trajectory if X ( t ) ∈ ∂ Ω for some t . 2. Method II: Simulate SDEs with drift u t ( x ) = ∇ ln Z t ( x ) where � � e − � τ ≥ t U τ ( X τ ,τ ) | X t = x Z t = E with U = ∞ if x / ∈ Ω and U = 0 else. Hence ∂ 2 Z t ( x ) ∂Z t ( x ) + 1 = 0 ∂x 2 ∂t 2 with Z t ( x ) = 0 for X ( t ) ∈ ∂ Ω. 16
Possible approximations if we haven’t got KL losses ? • Approximate solution to control problem � T � 1 � t =0 E u 2 u ( t ) ⊤ Ru ( t ) + U ( x ( t ) , t ) | X (0) = x V 0 ( x ) = for general matrix R . • Gaussian measure over paths X 0: T induced by linear (approximate) posterior SDE (Archambeau, Cornford, Opper & Shawe - Taylor, 2007) dX ( t ) = {− A ( t ) X + b ( t ) } dt + D 1 / 2 dW as an approximation to dX ( t ) = { u ( X, T ) + f ( X ) } dt + D 1 / 2 dW Replace u ( X, T ) ≈ − A ( t ) X + b ( t ) − f ( X ) → nonlinear ODEs for moments instead of linear PDEs ! 17
Possible extensions to other losses Simple non KL losses KL ( q � p ) → αKL ( q � p 1 ) − βKL ( q � p 2 ) with α, β > 0. Use iterative method (CCCP style) upper bounding − KL ( q � p 2 ) ≤ − E q [ln q n ] p 2 where q n is the present optimiser. 18
Recommend
More recommend