On the relation between certain stochastic control problems and - PowerPoint PPT Presentation

On the relation between certain stochastic control problems and probabilistic inference Manfred Opper Computer Science 1

• Control problem − → inference problem (Bert Kappen & Ema- �� Bellman nuel Todorov) • Inference problem − → control problems �� Var method • Things that could be learnt from this and possible extensions 2

Discrete times: MDPS • Assume Markov process with transition probabilities q ( x ′ | x, u ) tuned by a ’control’ variable u . • Try to minimise total expected costs T � E u [ L t ( X t , u t ) | x 0 = x ] V 0 ( x ) = t =0 • Define ’Value’ of state x � E u [ L τ ( X τ , u τ ) | X t = x ] V t ( x ) = τ ≥ t • Solution via Bellman equation     � q t ( x ′ | x, u ) V t +∆ t ( x ′ ) V t ( x ) = min  L t ( x, u ) + u  x ′ 3

Continuous time: SDEs • (Ito) stochastic differential equation for state X ( t ) ∈ R d dt + D 1 / 2 ( X ( t )) dX ( t ) = ( u ( X t , t ) + f ( X ( t )) dW ( t ) � �� Drift Diffusion W ( t ) vector of independent Wiener processes . • Limit of discrete time process X k √ ∆ X k ≡ X k +1 − X k = ( u t + f ( X k ))∆ t + D 1 / 2 ( X k ) ∆ t ǫ k . ǫ k i.i.d. Gaussian. 4

Continuous time ctd • Try to minimise total expected costs � T t =0 E u [ L t ( X ( t ) , u ( t )) | X (0) = x ] V 0 ( x ) = • Define ’Value’ of state x � T E u [ L s ( X ( s ) , u ( s )) | X ( t ) = x ] ds V t ( x ) = t • Solution via Hamilton - Jacobi - Bellman equation � � − ∂V t ( x ) L t ( x, u ) + ( u + f ) ⊤ ∇ V t ( x ) + 1 2Tr( D ∇ ⊤ ∇ ) V t ( x ) = min u ∂t 5

• Specialise to L t ( x, u ) = 1 2 u ( t ) ⊤ Ru ( t ) + U ( x ( t ) , t ) • Minimisation leads to u t = − R − 1 ∇ V t • and a a nonlinear PDE! − ∂V t ( x ) = − 1 2( ∇ V t ) ⊤ R − 1 ( ∇ V t ) + f ⊤ ( x ) ∇ V t ∂t +1 2Tr( D ∇ ⊤ ∇ ) V t + U ( x, t ) 6

Exact Linearisation (Kappen, 2005) • Assume that D = R − 1 and using the transformation V t ( x ) = − ln Z t ( x ) we get the equation � ∂ � ∂t + L † Z t ( x ) = 0 with the linear operator L † = f ⊤ ∇ + 1 2Tr( D ∇ ⊤ ∇ ) − U ( x, t ) • and a path integral representation e − � T � � Z t ( x ) = E u =0 t U τ ( X ( τ ) ,τ ) dτ | X ( t ) = x Now all kinds of inference tricks apply! 7

Todorov’s solvable MDPs (2006) q ( x ′ | x, u ) ln q ( x ′ | x, u ) � L t ( x, u ) = + U t ( x ) p ( x ′ | x ) x ′ Bellman equation  � � q ( x ′ | x, u )   � q ( x ′ | x, u ) ln + V t +∆ t ( x ′ ) V t ( x ) = min  U t ( x ) + p ( x ′ | x ) u  x ′ The controlled transition probabilities: q ( x ′ | x, u ) ∝ p ( x ′ | x ) e − V t +∆ t ( x ′ ) with V t ( x ) = − ln Z t ( x ) we get the linear equation (Todorov, 2005) Z t ( x ) = e − U t ( x ) � p ( x ′ | x ) Z t +∆ t ( x ′ ) x ′ 8

Relation to continuous case • Short time transition probability � � − 1 � � x ′ , t + ∆ t | x, t 2∆ t � ∆ x − f ( x )∆ t � 2 p ∝ exp D as ∆ t → 0, with � F � 2 D = F ⊤ D − 1 F . • Let p g and p f short term transition probabilities for Diffusion processes with drift g and f with same diffusion D . Then � x ′ , t + ∆ t | x, t � � ln p g � � p f ( x ′ , t + ∆ t | x, t ) dx ′ = x ′ , t + ∆ t | x, t p g ≃ 1 2 � g ( x, t ) − f ( x, t ) � 2 D ∆ t 9

The KL divergence for Markov processes Consider probabilities p ( X 0: T ), q ( X 0: T ) over entire paths X 0: T . The total KL divergence .... � dx 0: T q ( x 0: T ) ln q ( x 0: T ) KL [ q ( X 0: T ) � p ( X 0: T )] = p ( x 0: T ) T − 1 � � dx k +1 q ( x k +1 | x k ) ln q ( x k +1 | x k ) � = dx k q ( x k ) p ( x k +1 | x k ) k =1 T − 1 � � = dx k q ( x k ) KL [ q ( ·| X k ) � p ( ·| X k )] k =1 .... is the expected sum of KLs for transition probabilities. 10

The global solution The Kappen / Todorov control problems are of the form: Minimise the Variational free energy � V t ( x ) = KL [ q ( X t : T ) � p ( X t : T )] + E q [ U τ ( X τ , τ )] τ ≥ t for fixed X t = x with respect to q . The optimal controlled probability over paths is 1 Z t ( x ) p ( X t : T ) e − � τ ≥ t U τ ( X τ ,τ ) q ∗ ( X t : T ) = with the minimal cost ( free energy ) � � e − � τ ≥ t U τ ( X τ ,τ ) | X t = x V t ( x ) = − ln Z t ( x ) = − ln E p This looks like a HMM with ’ likelihood ’ e − � τ ≥ t U τ ( X τ ,τ ) . 11

Comments • For HMMs � � e − � τ ≥ t U τ ( X τ ,τ ) | X t = x Z t ( x ) = E p ∝ P (future data | X t = x ) fulfils a linear backward equation . • The posterior = controlled process has transition probabilities p ( x t +1 | x t ) = Z t +1 ( x t +1 ) q ( x t +1 | x t ) e − U t ( x t ,t ) Z t ( x t ) • Similar things happen for the continuous case: � ∂ � ∂tf ⊤ ∇ + 1 2Tr( D ∇ ⊤ ∇ ) − U ( x, t ) Z t ( x ) = 0 • Posterior is a diffusion with ’controlled’ drift u t ( x ) = D ∇ ln Z t ( x ). 12

A ’real’ likelihood for continuous time paths Consider an inhomogeneous Poisson process with rate function U ( X t ). Then Pr { No event ∈ [0 T ] } = e − � T t U s ( X s ,s ) ds 13

Application: Simulate diffusions with constraints Wiener process with fixed endpoints x ( t = T ) = 0 14

Solution u t ( x ) = ∂ ln Z t ( x ) ∂x ∂ 2 Z t ( x ) ∂Z t ( x ) + 1 = 0 ∂x 2 2 ∂t Z t ( x ) = δ ( x ) is solved by x 2 Z t ( x ) ∝ e − 2( T − t ) and leads to x u t ( x ) = − T − t for 0 < t < T . 15

Diffusions with constraints on domain: X ( t ) ∈ Ω. 1. Method I: Kill trajectory if X ( t ) ∈ ∂ Ω for some t . 2. Method II: Simulate SDEs with drift u t ( x ) = ∇ ln Z t ( x ) where � � e − � τ ≥ t U τ ( X τ ,τ ) | X t = x Z t = E with U = ∞ if x / ∈ Ω and U = 0 else. Hence ∂ 2 Z t ( x ) ∂Z t ( x ) + 1 = 0 ∂x 2 ∂t 2 with Z t ( x ) = 0 for X ( t ) ∈ ∂ Ω. 16

Possible approximations if we haven’t got KL losses ? • Approximate solution to control problem � T � 1 � t =0 E u 2 u ( t ) ⊤ Ru ( t ) + U ( x ( t ) , t ) | X (0) = x V 0 ( x ) = for general matrix R . • Gaussian measure over paths X 0: T induced by linear (approximate) posterior SDE (Archambeau, Cornford, Opper & Shawe - Taylor, 2007) dX ( t ) = {− A ( t ) X + b ( t ) } dt + D 1 / 2 dW as an approximation to dX ( t ) = { u ( X, T ) + f ( X ) } dt + D 1 / 2 dW Replace u ( X, T ) ≈ − A ( t ) X + b ( t ) − f ( X ) → nonlinear ODEs for moments instead of linear PDEs ! 17

Possible extensions to other losses Simple non KL losses KL ( q � p ) → αKL ( q � p 1 ) − βKL ( q � p 2 ) with α, β > 0. Use iterative method (CCCP style) upper bounding − KL ( q � p 2 ) ≤ − E q [ln q n ] p 2 where q n is the present optimiser. 18

On the relation between certain stochastic control problems and - PowerPoint PPT Presentation

On the relation between certain stochastic control problems and probabilistic inference Manfred Opper Computer Science 1 Control problem inference problem (Bert Kappen & Ema- Bellman nuel Todorov) Inference

Relation between things vs. a relation between people Lenin: Where the bourgeois economists

Relation between sets 2/16 A relation R between sets A and B is a predicate on A B . R ( x, y )

Part I: Soil Mechanics Volume-Volume relation Mass-Mass relation Mass-Volume relation

Relation Schema Given domains D 1 , D 2 , . D n a relation r is a subset of D 1 x D 2 x

Stochastic Processes Will Perkins March 7, 2013 Stochastic Processes Q: What is a Stochastic

What If We Only Have Stochastic . . . What if the Stochastic . . . Approximate Stochastic

Multicentre study to evaluate the relation study to evaluate the relation Multicentre between

Finance, Insurance, and Stochastic Control (II) Jin Ma Spring School on Stochastic Control in

Finance, Insurance, and Stochastic Control (IV) Jin Ma Spring School on Stochastic Control in

Finance, Insurance, and Stochastic Control (III) Jin Ma Spring School on Stochastic Control in

Finance, Insurance, and Stochastic Control (I) Jin Ma Spring School on Stochastic Control in

Industrial Robots Industrial Robots Control Control Part 1 Control Control Part 1 Part 1

The presentation of gender in relation to the works of The presentation of gender in relation to

Nud : ( ) Nud : ( ) Relation : [ ] Nud : ( ) Relation : [ ] Modles de recherche possibles

Mohamed Thahir Traditional and Open Relation Extraction Read the Web Relation Extraction

Disjoint Set Class A relation R on set S is a subset of S S : ( a,b ) is in R iff a is

A continuous time stochastic model for biological neural nets Leonardo Nagami Coregliano IME

Blocks & Gaps in the Asymmetric Simple Exclusion Process Craig Tracy, UC Davis &

Outlines Stochastic Process Discrete Time Markov Chain (DTMC) 2 Stochastic Process

quancol . ........ . . . ... ... ... ... ... ... ... www.quanticol.eu Introduction

COM 5115: Stochastic Processes for Networking Prof. Shun-Ren Yang Department of Computer

Optimum CUSUM Tests for Detecting Changes in Continuous Time Processes George V. Moustakides

Covariance and spectrum Repetition Covariance function: r w ( ) Ew ( t + ) w T ( t )

Geometric fluid approximation for general continuous-time Markov chains Michalis Michaelides

On the relation between certain stochastic control problems and - PowerPoint PPT Presentation

On the relation between certain stochastic control problems and probabilistic inference Manfred Opper Computer Science 1 Control problem inference problem (Bert Kappen & Ema- Bellman nuel Todorov) Inference

Relation between things vs. a relation between people Lenin: Where the bourgeois economists

Relation between sets 2/16 A relation R between sets A and B is a predicate on A B . R ( x, y )

Part I: Soil Mechanics Volume-Volume relation Mass-Mass relation Mass-Volume relation

Relation Schema Given domains D 1 , D 2 , . D n a relation r is a subset of D 1 x D 2 x

Stochastic Processes Will Perkins March 7, 2013 Stochastic Processes Q: What is a Stochastic

What If We Only Have Stochastic . . . What if the Stochastic . . . Approximate Stochastic

Multicentre study to evaluate the relation study to evaluate the relation Multicentre between

Finance, Insurance, and Stochastic Control (II) Jin Ma Spring School on Stochastic Control in

Finance, Insurance, and Stochastic Control (IV) Jin Ma Spring School on Stochastic Control in

Finance, Insurance, and Stochastic Control (III) Jin Ma Spring School on Stochastic Control in

Finance, Insurance, and Stochastic Control (I) Jin Ma Spring School on Stochastic Control in

Industrial Robots Industrial Robots Control Control Part 1 Control Control Part 1 Part 1

The presentation of gender in relation to the works of The presentation of gender in relation to

Nud : ( ) Nud : ( ) Relation : [ ] Nud : ( ) Relation : [ ] Modles de recherche possibles

Mohamed Thahir Traditional and Open Relation Extraction Read the Web Relation Extraction

Disjoint Set Class A relation R on set S is a subset of S S : ( a,b ) is in R iff a is

A continuous time stochastic model for biological neural nets Leonardo Nagami Coregliano IME

Blocks &amp; Gaps in the Asymmetric Simple Exclusion Process Craig Tracy, UC Davis &amp;

Outlines Stochastic Process Discrete Time Markov Chain (DTMC) 2 Stochastic Process

quancol . ........ . . . ... ... ... ... ... ... ... www.quanticol.eu Introduction

COM 5115: Stochastic Processes for Networking Prof. Shun-Ren Yang Department of Computer

Optimum CUSUM Tests for Detecting Changes in Continuous Time Processes George V. Moustakides

Covariance and spectrum Repetition Covariance function: r w ( ) Ew ( t + ) w T ( t )

Geometric fluid approximation for general continuous-time Markov chains Michalis Michaelides

Blocks & Gaps in the Asymmetric Simple Exclusion Process Craig Tracy, UC Davis &