Sequential Monte Carlo N Nando de Freitas & Arnaud Doucet d d F it & A d D t University of British Columbia December 2009
Tutorial overview • • Introduction Nando – 10min Introduction Nando – 10min • Part I Arnaud – 50min – Monte Carlo – Sequential Monte Carlo Sequential Monte Carlo – Theoretical convergence – Improved particle filters – Online Bayesian parameter estimation Online Bayesian parameter estimation – Particle MCMC – Smoothing – Gradient based online parameter estimation Gradient based online parameter estimation • Break 15min • Part II NdF – 45 min – Beyond state space models Beyond state space models – Eigenvalue problems – Diffusion, protein folding & stochastic control – Time-varying Pitman-Yor Processes Time varying Pitman Yor Processes – SMC for static distributions – Boltzmann distributions & ABC
Tutorial overview • • Introduction Nando – 10min Introduction Nando – 10min • Part I Arnaud – 50min – Monte Carlo 20 th century 20 th century – Sequential Monte Carlo Sequential Monte Carlo – Theoretical convergence – Improved particle filters – Online Bayesian parameter estimation Online Bayesian parameter estimation – Particle MCMC – Smoothing – Gradient based online parameter estimation Gradient based online parameter estimation • Break 15min • Part II NdF – 45 min – Beyond state space models Beyond state space models – Eigenvalue problems – Diffusion, protein folding & stochastic control – Time-varying Pitman-Yor Processes Time varying Pitman Yor Processes – SMC for static distributions – Boltzmann distributions & ABC
SMC in this community Many researchers in the NIPS community have contributed to the field of Sequential Monte Carlo over the last decade. • Michael Isard and Andrew Blake popularized the method with their Condensation algorithm for image tracking. Condensation algorithm for image tracking. • Soon after, Daphne Koller, Stuart Russell, Kevin Murphy, Sebastian Thrun, Dieter Fox and Frank Dellaert and their colleagues demonstrated the method in AI and robotics. in AI and robotics. • Tom Griffiths and colleagues have studied SMC methods in cognitive psychology.
The 20 th century - Tracking The 20 th century - Tracking [Michael Isard & Andrew Blake (1996)]
The 20 th century - Tracking The 20 th century - Tracking [Boosted particle filter of Kenji Okuma, Jim Little & David Lowe]
The 20 th century – State estimation The 20 th century – State estimation [Dieter Fox] http://www.cs.washington.edu/ai/Mobile_Robotics/mcl /
The 20 th century – State estimation The 20 th century – State estimation [Dieter Fox] http://www.cs.washington.edu/ai/Mobile_Robotics/mcl /
The 20 th century – State estimation The 20 th century – State estimation
The 20 th century – The birth The 20 th century – The birth [Metropolis and Ulam, 1949]
Tutorial overview • • Introduction Nando – 10min Introduction Nando – 10min • Part I Arnaud – 50min – Monte Carlo – Sequential Monte Carlo Sequential Monte Carlo – Theoretical convergence – Improved particle filters – Online Bayesian parameter estimation Online Bayesian parameter estimation – Particle MCMC – Smoothing – Gradient based online parameter estimation Gradient based online parameter estimation • Break 15min • Part II NdF – 45 min – Beyond state space models Beyond state space models – Eigenvalue problems – Diffusion, protein folding & stochastic control – Time-varying Pitman-Yor Processes Time varying Pitman Yor Processes – SMC for static distributions – Boltzmann distributions & ABC
Arnaud’s slides will go here Arnaud s slides will go here
Tutorial overview • • Introduction Nando – 10min Introduction Nando – 10min • Part I Arnaud – 50min – Monte Carlo – Sequential Monte Carlo Sequential Monte Carlo – Theoretical convergence – Improved particle filters – Online Bayesian parameter estimation Online Bayesian parameter estimation – Particle MCMC – Smoothing – Gradient based online parameter estimation Gradient based online parameter estimation • Break 15min • Part II NdF – 45 min – Beyond state space models Beyond state space models – Eigenvalue problems – Diffusion, protein folding & stochastic control – Time-varying Pitman-Yor Processes Time varying Pitman Yor Processes – SMC for static distributions – Boltzmann distributions & ABC
Sequential Monte Carlo (recap) X 0 X 3 X 1 X 2 1 2 y 1 Y 1 Y 2 Y 3 P ( X 1 | X 0 ) P ( Y 1 | X 1 ) P ( X 2 | X 1 ) P ( Y 2 | X 2 ) P ( X 3 | X 2 ) P ( Y 3 | X 3 ) ∝ P ( X 0:3 | Y 1:3 ) P ( X 0 )
Sequences of distributions • SMC methods can be used to sample approximately from any sequence of growing distributions { π n } n ≥ 1 π n ( x 1: n ) = f n ( x 1: n ) f ( x ) Z n where — f n : X n → R + is known point-wise. R — Z n = f n ( x 1: n ) dx 1: n • We introduce a proposal distribution q n ( x 1: n ) to approzimate Z n : Z f n ( x 1: n ) Z Z n = q n ( x 1: n ) q n ( x 1: n ) dx 1: n = W n ( x 1: n ) q n ( x 1: n ) dx 1: n
Importance weights • Let us construct the proposal sequentially: Introduce q n ( x n | x 1: n − 1 ) to sample component X n given X 1: n − 1 = x 1: n − 1 . • Then the importance weight becomes: f n ( x 1: n ) W n = W n − 1 f n − 1 ( x 1: n − 1 ) q n ( x n | x 1: n − 1 )
SMC algorithm 1. Initialize at time n 1. Initialize at time n = 1 1 2. At time n ≥ 2 ³ ³ ´ ´ ³ ³ ( i ) ´ ´ ( i ) ( ) ( ) ( i ) ( ) x n | X ( i ) | X ( i ) X ( i ) X ( i ) • Sample X S l X n ∼ q n and augment X d t X 1: n = 1: n − 1 , X X 1: n − 1 n • Compute the sequential weight ³ ´ ( i ) f n X 1: n W ( i ) ¯ ∝ ³ ³ ´ ´ ³ ³ ´ . ´ n ¯ ( i ) ( i ) ( i ) f n − 1 X q n X ¯ X ¯ 1: n − 1 1: n − 1 n Then the target approximation is: N N X W ( i ) e π n ( x 1: n ) = n δ X 1: n ( x 1: n ) ( i ) i =1 P N • Resample X ( i ) π n ( x 1: n ) = 1 1: n ∼ e π n ( x 1: n ) to obtain b i =1 δ X ( i ) 1: n ( x 1: n ). N
Example 1: Bayesian filtering f n ( x 1: n ) = p ( x 1: n , y 1: n ) , π n ( x 1: n ) = p ( x 1: n | y 1: n ) , Z n = p ( y 1: n ) , | y f ( x ) = p ( x y ) π ( x ) = p ( x ) Z = p ( y ) q n ( x n | x 1: n − 1 ) = f ( x n | x 1: n − 1 ) .
Example 2: Eigen-particles Computing eigen-pairs of exponentially large matrices and operators is an important problem in science. I will give two motivating examples: i. Diffusion equation & Schrodinger’s equation in quantum physics ii. Transfer matrices for estimating the partition function of Boltzmann machines Both problems are of enormous importance in physics and learning.
Quantum Monte Carlo We can map this multivariable di ff erential equation to an eigenvalue problem: p q g p Z ψ ( r ) K ( s | r ) d r = λψ ( s ) In the discrete case, this is the largest eigenpair of the M × M matrix A : M X X ≡ A x = λ x A λ x ( r ) a ( r, s ) = λ x ( s ) , ( ) ( ) λ ( ) s = 1 , 2 , . . . , M 1 2 M i =1 where a ( r, s ) is the entry of A at row r and column s . ( , ) y [JB Anderson, 1975, I Kosztin et al, 1997]
Transfer matrices of Boltzmann Machines μ i,j ∈ { − 1 , 1 } μ i,j à à ! ! n m m X Y X X Z = exp ν μ i,j μ i +1 ,j + ν μ i,j μ i,j +1 { μ } j =1 i =1 i =1 2 m n X Y X λ n = A ( σ j , σ j +1 ) = σ j = ( μ 1 ,j , . . . , μ m,j ) k j =1 k =1 { σ 1 ,..., σ n } [see e.g . Onsager, Nimalan Mahendran]
Power method Let A have M linearly independent eigenvectors, then any vector v may be represented as a linear combination of the eigenvectors of A : v = P P A i c i x i , where c is a constant. Consequently, for su ffi ciently large n , A n v ≈ c 1 λ n x 1 v ≈ c 1 λ 1 x 1 A
Particle power method Succesive matrix vector multiplication maps to Kernel function multiplication Succesive matrix-vector multiplication maps to Kernel-function multiplication (a path integral) in the continuous case: Z Z Z Z n Y Y K ( x k | x k − 1 ) d x 1: n − 1 ≈ c 1 λ n λ n ψ ( x ) · · · K ( x | x ≈ v ( x 1 ) ( x ) ) d x 1 ψ ( x n ) k =2 The particle method is obtained by de fi ning n Y f ( x 1: n ) = v ( x 1 ) K ( x k | x k − 1 ) k =2 Consequently c λ n 1 − → Z n and ψ ( x n ) − → π ( x n ). The largest eigenvalue λ 1 of K is given by the ratio of successive partition functions: Z n λ 1 = Z n − 1 The importance weights are v ( x 1 ) Q n k =2 K ( x k | x k − 1 ) K ( x n | x n − 1 ) W n = W n − 1 = W n − 1 Q ( x n | x 1: n ) v ( x 1 ) Q n − 1 Q ( x n | x 1: n ) k =2 K ( x k | x k − 1 )
Example 3: Particle diffusion • A particle { X n } n ≥ 1 evolves in a random medium A i l { X } l i d di X 1 ∼ μ ( · ) , X n +1 | X n = x ∼ p ( ·| x ) . • At time n , the probability of it being killed is 1 − g ( X n ) with 0 ≤ g ( x ) ≤ 1. • One wants to approximate Pr ( T > n ). pp ( ) ?
Recommend
More recommend