Control, inference and learning Bert Kappen : SNN Donders - PowerPoint PPT Presentation

Control, inference and learning Bert Kappen : SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London July 21, 2015 Bert Kappen

Why control theory? A theory for intelligent behaviour: - neuroscience Bert Kappen Oxford 2015 1/58

Why control theory? A theory for intelligent behaviour: - neuroscience - robotics Bert Kappen Oxford 2015 2/58

Control theory Given a current state and a future desired state, what is the best/cheapest/fastest way to get there. Bert Kappen Oxford 2015 3/58

Why stochastic control? Bert Kappen Oxford 2015 4/58

How to control? Hard problems: - a learning and exploration problem - a stochastic optimal control computation - a representation problem u ( x , t ) Bert Kappen Oxford 2015 5/58

The idea: Control, Inference and Learning Linear Bellman equation and path integral solution Express a control computation as an inference computation. Bert Kappen Oxford 2015 6/58

The idea: Control, Inference and Learning Linear Bellman equation and path integral solution Express a control computation as an inference computation. Compute optimal control using MC sampling Bert Kappen Oxford 2015 7/58

The idea: Control, Inference and Learning Linear Bellman equation and path integral solution Express a control computation as an inference computation. Compute optimal control using MC sampling Importance sampling Accellerate with importance sampling, a state-feedback controller Bert Kappen Oxford 2015 8/58

The idea: Control, Inference and Learning Linear Bellman equation and path integral solution Express a control computation as an inference computation. Compute optimal control using MC sampling Importance sampling Accellerate with importance sampling, a state-feedback controller Learn controller from self-generated data Bert Kappen Oxford 2015 9/58

The idea: Control, Inference and Learning Linear Bellman equation and path integral solution Express a control computation as an inference computation. Compute optimal control using MC sampling Importance sampling Accellerate with importance sampling, a state-feedback controller Learn controller from self-generated data Optimal importance sampler is optimal control Bert Kappen Oxford 2015 10/58

The idea: Control, Inference and Learning Linear Bellman equation and path integral solution Express a control computation as an inference computation. Compute optimal control using MC sampling Importance sampling Accellerate with importance sampling, a state-feedback controller Learn controller from self-generated data Optimal importance sampler is optimal control Learn a good importance sampler using PICE Bert Kappen Oxford 2015 11/58

Outline • Introduction to control theory • Link between control theory, inference and statistical physics – Schr¨ odinger, Fleming Mitter ’82, Kappen ’05, Todorov ’06 • Importance sampling – Relation between optimal sampling and optimal control • Cross entropy method for adaptive importance sampling (PICE) – A criterion for parametrized control optimization – Learning by gradient descent • Some examples Bert Kappen Oxford 2015 12/58

Discrete time optimal control Consider the control of a discrete time deterministic dynamical system: x t + 1 = x t + f ( x t , u t ) , t = 0 , 1 , . . . , T − 1 x t describes the state and u t specifies the control or action at time t . Given x 0 and u 0: T − 1 , we can compute x 1: T . Define a cost for each sequence of controls: T − 1 � C ( x 0 , u 0: T − 1 ) = V ( x t , u t ) t = 0 Find the sequence u 0: T − 1 that minimizes C ( x 0 , u 0: T − 1 ) . Bert Kappen Oxford 2015 13/58

Dynamic programming Find the minimal cost path from A to J. min(6 + C ( H ) , 3 + C ( I )) = 7 C ( F ) = Minimal cost at time t easily expressable in terms of minimal cost at time t + 1 . Bert Kappen Oxford 2015 14/58

Discrete time optimal control Dynamic programming uses concept of optimal cost-to-go J ( t , x ) . One can recursively compute J ( t , x ) from J ( t + 1 , x ) for all x in the following way:  T − 1  �     J ( t , x t ) V ( x s , u s ) = min       u t : T − 1     s = t min u t ( V ( t , x t , u t ) + J ( t + 1 , x t + f ( t , x t , u t ))) = J ( T , x ) = 0 J (0 , x ) u 0: T − 1 C ( x , u 0: T − 1 ) min = This is called the Bellman Equation. Computes u t ( x ) for all intermediate t , x . Bert Kappen Oxford 2015 15/58

Stochastic optimal control Consider a stochastic dynamical system dX i = f i ( X t , u ) dt + dW i E ( dW i dW j ) = ν i j dt Given x (0) find control function u ( x , t ) that minimizes the expected future cost � T � � C E φ ( X T ) + dtV ( X t , u ( X t , t )) = 0 Expectation is over all trajectories given the control path. J ( t , x ) u ( V ( x , u ) + E J ( t + dt , x + dx )) = min � � V ( x , u ) + f ( x , u ) ∇ x J ( x , t ) + 1 2 ν ∇ 2 − ∂ t J ( t , x ) min x J ( x , t ) = u with u = u ( x , t ) and boundary condition J ( x , T ) = φ ( x ) . This is HJB equation. Bert Kappen Oxford 2015 16/58

Computing the optimal control solution is hard - solve a Bellman Equation, a PDE - scales badly with dimension Efficient solutions exist for - linear dynamical systems with quadratic costs (Gaussians) - deterministic systems (no noise) Bert Kappen Oxford 2015 17/58

Path integral control theory f ( X t , t ) dt + g ( X t , t )( udt + dW t ) dX t = � T � � dsV ( X s , s ) + 1 2 u T ( X t , t ) Ru ( X t , t ) C = E φ ( X T ) + t with E ( dW a dW b ) = ν ab dt and R = λν − 1 , λ > 0 . f ∈ R n , g ∈ R n × m , u ∈ R m . The HJB equation becomes � 1 �� 2 u T Ru + V + ( f + gu ) T ( ∇ J ) + 1 � g ν g T ∇ 2 J − ∂ t J = min 2Tr u with boundary condition J ( x , T ) = φ ( x ) . Bert Kappen Oxford 2015 18/58

Path integral control theory Minimization wrt u yields: − R − 1 g T ( x , t ) ∇ J ( x , t ) u ( x , t ) = − 1 2( ∇ J ) T gR − 1 g T ( ∇ J ) + V + f T ∇ J + 1 � � g ν g T ∇ 2 J − ∂ t J = 2Tr Define ψ ( x , t ) through J ( x , t ) = − λ log ψ ( x , t ) . We obtain a linear HJB: � V g ν g T ∇ 2 �� λ − f T ∇ − 1 � ∂ t ψ = ψ 2Tr Bert Kappen Oxford 2015 19/58

Feynman-Kac formula Denote q ( τ | x , t ) the distribution over uncontrolled trajectories that start at x , t : f ( X t , t ) dt + g ( X t , t ) dW dX t = with τ a trajectory x ( t → T ) . Then � � � − S ( τ ) � e − S /λ � ψ ( x , t ) dq ( τ | x , t ) exp = = E q λ � T S ( τ ) φ ( x ( T )) + dsV ( x ( s ) , s ) = t Bert Kappen Oxford 2015 20/58

Posterior distribution over optimal trajectories ψ ( x , t ) is the partition sum for the distribution over paths under optimal control: � � 1 − S ( τ ) p ∗ ( τ | x , t ) = ψ ( x , t ) q ( τ | x , t ) exp λ The optimal cost-to-go is a free energy: � e − S /λ � J ( x , t ) = − λ log E q The optimal control is an expectation wrt p : � dWe − S /λ � E q u ∗ ( x , t ) dt = E p ∗ ( dW t ) = � e − S /λ � E q J , u ∗ can be computed by forward sampling from q . Bert Kappen Oxford 2015 21/58

Delayed choice � � dW 2 dX t = u ( X t , t ) dt + dW t = ν dt t � 2 dt 1 2 u ( t ) 2 C ( p ) = E p φ ( x T ) + 0 Cost encodes targets at t = 2 . 3 2 1 0 −1 −2 −3 0 0.5 1 1.5 2 Bert Kappen Oxford 2015 22/58

Delayed choice Time-to-go T = 2 − t . 1.8 3 T=2 1.6 2 T=1 1.4 1 T=0.5 1.2 J(x,t) 0 1 0.8 −1 0.6 −2 0.4 −2 −1 0 1 2 −3 x 0 0.5 1 1.5 2 J ( x , t ) = − ν log E q exp( − φ ( X 2 ) /ν ) Decision is made at T = 1 ν Bert Kappen Oxford 2015 23/58

Delayed choice Time-to-go T = 2 − t . 1.8 3 T=2 1.6 2 T=1 1.4 1 T=0.5 1.2 J(x,t) 0 1 0.8 −1 0.6 −2 0.4 −2 −1 0 1 2 −3 x 0 0.5 1 1.5 2 J ( x , t ) = − ν log E q exp( − φ ( X 2 ) /ν ) ”When the future is uncertain, delay your decisions.” Bert Kappen Oxford 2015 24/58

KL control Uncontrolled dynamics specifies distribution q ( τ | x , t ) over trajectories τ from t → T . � T Cost for trajectory τ is S ( τ ) = φ ( x T ) + t dsV ( x s , s ) . Find optimal distribution p ( τ | x . t ) that minimizes E p S and is ’close’ to q ( τ | x , t ) . Bert Kappen Oxford 2015 25/58

Controlled diffusions are special case In the case of controlled diffusions, p is parametrised by functions u ( x , t ) : dX t = f ( X t , t ) dt + g ( X t , t )( u ( X t , t ) dt + dW t ) E ( dW i dW j ) = ν i j dt � T � � ds 1 2 u ( X s , s ) T ν − 1 u ( X s , s ) + V ( X s , s ) C ( p ) = E p φ ( X T ) + t ψ ( x , t ) is the solution of the linear Bellman equation and J ( x , t ) = − log ψ ( x , t ) is the optimal cost-to-go. Bert Kappen Oxford 2015 27/58

Sampling efficiency 10 5 0 −5 −10 0 0.5 1 1.5 2 Sampling with uncontrolled dynamics is theoretically correct, but inefficient in efficient in practice. Bert Kappen Oxford 2015 28/58

Control, inference and learning Bert Kappen : SNN Donders - PowerPoint PPT Presentation

Control, inference and learning Bert Kappen : SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London July 21, 2015 Bert Kappen Why control theory? A theory for intelligent behaviour: - neuroscience Bert Kappen Oxford

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Industrial Robots Industrial Robots Control Control Part 1 Control Control Part 1 Part 1

Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard

Post-Selection Inference Todd Kuffner Washington University in St. Louis PhyStat 2016

Type Inference 75 Definition Type Inference Type inference = Java compiler's ability

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Exact Inference Inference Basic task for inference: Compute

On the relation between certain stochastic control problems and probabilistic inference Manfred

Variational Inference CMSC 691 UMBC Goal: Posterior Inference Hyperparameters Unknown

Industrial Robots Industrial Robots Control Control Part 2 Control Control Part 2 Part 2

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Graphical Models Graphical Models MAP inference Siamak Ravanbakhsh Winter 2018 Learning

Causal Inference and Response Surface Modeling Inference and

TensorRT 2. Setup of the TensorRT inference engine 2. Setup of the TensorRT inference engine 3. I/O

The Foundations: Logic and Proofs Chapter 1, Part III: Proofs Rules of Inference Section 1.6

Inference in first-order logic Russell and Norvig Chapter 9 Outline n Reducing first-order

vandana saxena CIE, Department of Education University of Delhi 2 vsaxena69@gmail.com May 3,

The Role of Engineering Tools in the Rise of Smart Connected Products Blue Ocean Equities

TACTICS Tactical Approach to Counter Terrorists in Cities London, IFSEC, June 17 th 2015 Changing

Applying Behavioural Insights to Public Policy Simon Ruda Outline 1. What are behavioural

PASS @ Ulster Melanie Giles Laura ONeill Danielle Morelli School of Psychology Amanda

benefit Architectures for the future of notifications in the IoT Supervisor(s) Presenter Fulvio

IT and the Built Environment a revolution! Tristram Carfrae | BSRIA| November 2011 IT in the

SMART PLACES WHAT. WHY. HOW. @adambeckurban @smartcitiesanz We envision a world where digital