for Sequential Bayesian Inference Le Song Associate Professor, CSE - PowerPoint PPT Presentation

Meta Particle Flow for Sequential Bayesian Inference Le Song Associate Professor, CSE Associate Director, Machine Learning Center Georgia Institute of Technology Joint work with Xinshi Chen and Hanjun Dai

Bayesian In Inference Infer the posterior distribution of unknown parameter 𝒚 given • Prior distribution 𝜌(𝑦) 𝑦 Likelihood function 𝑞(𝑝|𝑦) • Observations 𝑝 1 , 𝑝 2 , … , 𝑝 𝑛 • 𝑝 1 𝑝 2 𝑝 𝑛 …… …… 𝑛 = 1 𝑞 𝑦 𝑝 1:𝑛 𝑨 𝜌 𝑦 𝑞(𝑝 𝑗 |𝑦) 𝑗=1 𝑛 𝑨 = 𝜌 𝑦 𝑞(𝑝 𝑗 |𝑦) 𝑒𝑦 𝑗=1 Challenging computational problem for high dimensional 𝑦

Challe llenges in in Bayesian In Inference Gaussian Mixture Model prior 𝑦 1 , 𝑦 2 ∼ 𝜌 𝑦 = 𝒪(0, 𝐽) • observations o|𝑦 1 , 𝑦 2 ∼ 𝑞 𝑝 𝑦 1 , 𝑦 2 = 1 2 𝒪 𝑦 1 , 1 + 1 2 𝒪(𝑦 1 + 𝑦 2 , 1) • With 𝑦 1 , 𝑦 2 = (1, −2) , the resulting posterior will have two modes: 1, −2 and −1, 2 • To fit only one posterior 𝒒(𝒚|𝒑 𝟐:𝒏 ) is already not easy. [Results reported by Dai et al. (2016)] −3 x 10 3 10 9 2 8 1 7 6 0 5 −1 4 3 −2 2 −3 1 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 (a) True posterior (d) Gibbs Sampling (e) One-pass SMC (b) Stochastic Variational (c) Stochastic Gradient Inference Langevin Dynamics

Fundamental Prin incip iple for Machin ine Learnin ing Lots of applications in machine learning true location • 𝑦 1 𝑦 2 𝑦 𝑛 Hidden Markov model …… sensor 𝑝 1 𝑝 2 𝑝 𝑛 measure 𝑦 • topic Topic modeling 𝛽 𝑨 𝑝 𝜄 𝑁 word 𝑂 • Uncertainty quantification 𝑧 𝑢−𝜐 𝑝 𝑢+1 = 𝑄𝑝 𝑢−𝜐 exp − 𝑓 𝑢 + 𝑝 𝑢 exp −𝜀𝜗 𝑢 , • 𝑧 0 2 , 𝜗 𝑢 ∼ Γ 𝜏 𝑒 −2 , 𝜏 𝑞 −2 , 𝜏 𝑒 2 𝑓 𝑢 ∼ Γ 𝜏 𝑞 • 2 , 𝜏 𝑒 2 , 𝜐, 𝜀 parameters 𝑦 = 𝑄, 𝑧 0 , 𝜏 𝑞 •

Sequential l Bayesian In Inference Online Bayesian Inference Observations 𝑝 1 , 𝑝 2 , … , 𝑝 𝑛 arrive sequentially • An ideal algorithm should: Efficiently update 𝑞 𝑦 𝑝 1:𝑛 to 𝑞 𝑦 𝑝 1:𝑛+1 when 𝑝 𝑛+1 is observed • • Without storing all historical observations 𝑝 1 , 𝑝 2 , … , 𝑝 𝑛 𝑞 𝑦 𝑝 1:𝑛 ∝ 𝑞 𝑦 𝑝 1:𝑛−1 𝑞 𝑝 𝑛 𝑦 updated posterior current posterior likelihood 𝑦 𝑝 1 𝑝 2 𝑝 𝑛 …… …… … 𝑞 𝑦 𝑝 1:2 𝑞 𝑦 𝑝 1:𝑛 prior 𝜌(𝑦) 𝑞 𝑦 𝑝 1 𝑝 1 𝑝 2 𝑝 𝑛

Rela lated Work • MCMC – requires a complete scan of the data • Variational Inference (VI) – requires re-optimization for every new observation • Stochastic approximate inference – are prescribed algorithms to optimize the final posterior 𝑞 𝑦 𝑝 1:𝑁 – can not exploit the structure of the sequential inference problem 𝑦 𝑝 1 𝑝 2 𝑝 𝑛 …… …… … prior 𝜌(𝑦) 𝑞 𝑦 𝑝 1 𝑞 𝑦 𝑝 1:2 𝑞 𝑦 𝑝 1:𝑛 𝑝 1 𝑝 2 𝑝 𝑛

Rela lated Work  Sequential monte Carlo (Doucet et al., 2001; Balakrishnan&Madigan, 2006) – the state of art for online Bayesian Inference – but suffers from path degeneracy problem in high dimensions – rejuvenation steps can help but will violate online constraints (Canini et al., 2009) Can we learn to perform efficient and effective sequential Bayesian update? 𝑦 𝑝 1 𝑝 2 𝑝 𝑛 …… …… … prior 𝜌(𝑦) 𝑞 𝑦 𝑝 1 𝑞 𝑦 𝑝 1:2 𝑞 𝑦 𝑝 1:𝑛 𝑝 1 𝑝 2 𝑝 𝑛

Operator Vie iew  Kernel Bayes’ Rule (Fukumizu et al., 2012) – the posterior is represented as an embedding 𝜈 𝑛 = 𝔽 𝑞(𝑦|𝑝 1:𝑛 ) [𝜚 𝑦 ] – 𝜈 𝑛+1 = 𝒧( 𝜈 𝑛 , 𝑝 𝑛+1 ) updated embedding current embedding – views the Bayes update as an operator in reproducing kernel Hilbert space (RKHS) – conceptually nice but is limited in practice 𝑦 𝑝 1 𝑝 2 𝑝 𝑛 …… …… … prior 𝜌(𝑦) 𝑞 𝑦 𝑝 1 𝑞 𝑦 𝑝 1:2 𝑞 𝑦 𝑝 1:𝑛 𝑝 1 𝑝 2 𝑝 𝑛

Our Approach: Bayesian In Inference as Particle Flo low Particle Flow Start with 𝑶 particles • 1 , … , 𝑦 0 𝑂 } , sampled i.i.d. from prior 𝜌(𝑦) 𝒴 0 = {𝑦 0 • Transport particles to next posterior via solution of an initial value problem (IVP) 𝑒𝑦 𝑜 𝑒𝑢 = 𝑔 𝒴 0 , 𝑝 1 , 𝑦(𝑢) , ∀𝑢 ∈ [0, 𝑈] and 𝑦 0 = 𝑦 0 𝑜 = 𝑦(𝑈) ⟹ solution 𝑦 1 𝑈 𝑜 = 𝑦 0 𝑜 + 𝑦 1 𝑔 𝒴 0 , 𝑝 1 , 𝑦(𝑢) 𝑒𝑢 0 1 , … , 𝑦 1 𝑂 } 𝒴 1 = {𝑦 1 1 , … , 𝑦 0 𝑂 } 𝒴 0 = {𝑦 0 𝒴 1 ∼ 𝑞(𝑦|𝑝 1 ) 𝒴 0 ∼ 𝜌(𝑦)

Flo low Property • Continuity Equation expresses the law of local conservation of mass : – Mass can neither be created nor destroyed – nor can it ‘teleport’ from one place to another 𝜖𝑟 𝑦, 𝑢 = −𝛼 𝑦 ⋅ (𝑟𝑔) 𝜖𝑢 Theorem . If 𝑒𝑦 𝑒𝑢 = 𝑔 , then the change in log-density follows the differential equation • 𝑒 log 𝑟 𝑦, 𝑢 = −𝛼 𝑦 ⋅ 𝑔 𝑒𝑢 • Notation – 𝑒𝑟 𝑒𝑢 is material derivative that defines the rate of change of 𝑟 in a given particle as it moves along its trajectory 𝑦 = 𝑦(𝑢) – 𝜖𝑟 𝜖𝑢 is partial derivative that defines the rate of change of 𝑟 at a particular point 𝑦

Partic icle Flo low for Sequentia ial Bayesian In Inference 𝑦 𝑝 1 𝑝 2 𝑝 𝑛 …… …… … prior 𝜌(𝑦) 𝑞 𝑦 𝑝 1 𝑞 𝑦 𝑝 1:2 𝑞 𝑦 𝑝 1:𝑛 𝑝 1 𝑝 2 𝑝 𝑛 1 , … , 𝑦 0 𝑂 } 𝒴 0 = {𝑦 0 1 , … , 𝑦 1 𝑂 } 𝒴 1 = {𝑦 1 1 , … , 𝑦 2 𝑂 } 𝒴 2 = {𝑦 2 …… 𝑈 𝑈 𝑈 𝑔 𝒴 1 , 𝑝 2 , 𝑦(𝑢) 𝑒𝑢 𝑔 𝒴 0 , 𝑝 1 , 𝑦(𝑢) 𝑒𝑢 𝑔 𝒴 2 , 𝑝 3 , 𝑦(𝑢) 𝑒𝑢 0 0 0 Particle Flow for Sequential Bayesian Inference 𝑈 𝑜 + 𝑜 𝑦 𝑛+1 = 𝑦 𝑛 𝑔 𝒴 𝑛 , 𝑝 𝑛+1 , 𝑦(𝑢) 𝑒𝑢 0 𝑈 𝑜 + 𝑜 −log 𝑞 𝑛+1 = −log 𝑞 𝑛 𝛼 𝑦 ⋅ 𝑔 𝒴 𝑛 , 𝑝 𝑛+1 , 𝑦(𝑢) 𝑒𝑢 0 • Other ODE approaches (eg. Neural ODE of Chen et al 18), are not for sequential case.

low Velocity 𝒈 Exis Shared Flo ists? 𝑦 0 ∼ 𝜌(𝑦) 𝑦 𝑢 ∼ 𝑞(𝑦|𝑝 1 ) 𝑈 𝑦 𝑈 = 𝑦 0 + 𝑔(𝑗𝑜𝑞𝑣𝑢𝑡)𝑒𝑢 0 Does a shared flow velocity 𝑔 exist for different Bayesian inference tasks involving different priors and different observations? A simple Gaussian Example Prior 𝜌 𝑦 = 𝒪(0, 𝜏 0 ) , likelihood 𝑞 𝑝 𝑦 = 𝒪 𝑦, σ , observation 𝑝 = 0 • ⟹ posterior 𝑞 𝑦 𝑝 = 0 = 𝒪(0, 𝜏⋅𝜏 0 𝜏+𝜏 0 ) • Whether a shared 𝑔 exists for priors with different 𝜏 0 ? What is the form for it? • – E.g. 𝑔 in the form of 𝑔(𝑝, 𝑦(𝑢)) won’t be able to handle different 𝜏 0 .

Exi xistence: Connection to Stochastic Flo low • Langevin dynamics is a stochastic process 𝑒𝑦 𝑢 = 𝛼 𝑦 log 𝜌 𝑦 𝑞(𝑝|𝑦) 𝑒𝑢 + 2 𝑒𝑥 𝑢 , where 𝑒𝑥(𝑢) is a standard Brownian motion. Property. If the potential function Ψ 𝑦 ≔ −log 𝜌 𝑦 𝑞(𝑝|𝑦) is smooth and • 𝑓 −Ψ ∈ 𝑀 1 (ℝ 𝑒 ) , the Fokker-Planck equation has a unique stationary solution in the form of Gibbs distribution, 𝑟 𝑦, ∞ = 𝑓 −Ψ = 𝜌 𝑦 𝑞 𝑝 𝑦 = 𝑞(𝑦|𝑝) 𝑎 𝑎

Exi xistence: Connection to Stochastic Flo low • The probability density 𝑟(𝑦, 𝑢) of 𝑦 𝑢 follows a deterministic evolution according to the Fokker-Planck equation 𝜖𝑟 𝜖𝑢 = −𝛼 𝑦 ⋅ 𝑟𝛼 𝑦 log 𝜌 𝑦 𝑞 𝑝 𝑦 + Δ 𝑦 𝑟 𝑦, 𝑢 = −𝛼 𝑦 ⋅ (𝑟(𝛼 𝑦 log 𝜌 𝑦 𝑞(𝑝|𝑦) − 𝛼 𝑦 log 𝑟(𝑦, 𝑢))) , 𝑔 which is in the form of Continuity Equation. • Theorem. When the deterministic transformation of random variable 𝑦 𝑢 follows 𝑒𝑦 𝑒𝑢 = 𝛼 𝑦 log 𝜌 𝑦 𝑞(𝑝|𝑦) − 𝛼 𝑦 log 𝑟 𝑦, 𝑢 , its probability density 𝑞(𝑦, 𝑢) converges to the posterior 𝑞(𝑦|𝑝) as 𝑢 → ∞ .

Exis xistence: Clo lose-Loop to Open-Loop Conversion Close loop to Open loop Fokker-Planck equation leads to close loop flow, depend not just on 𝜌(𝑦) and 𝑞 𝑝 𝑦 , • but also on flow state 𝑟 𝑦, 𝑢 . Is there an equivalent form independent of 𝑟 𝑦, 𝑢 which can achieve the same flow? • Optimization problem min 𝑒 𝑟 𝑦, ∞ , 𝑞(𝑦|𝑝) 𝑥 𝑡. 𝑢. 𝑒𝑦 𝑒𝑢 = 𝛼 𝑦 log 𝜌 𝑦 𝑞(𝑝|𝑦) − 𝑥, Positive answer: there exists a fixed and deterministic flow velocity 𝑔 of the form • 𝑒𝑦 𝑦 log 𝜌 𝑦 𝑞(𝑝|𝑦) − 𝑥 ∗ (𝜌 𝑦 , 𝑞 𝑝|𝑦 , 𝑦, 𝑢) 𝑒𝑢 = 𝛼

for Sequential Bayesian Inference Le Song Associate Professor, CSE - PowerPoint PPT Presentation

Meta Particle Flow for Sequential Bayesian Inference Le Song Associate Professor, CSE Associate Director, Machine Learning Center Georgia Institute of Technology Joint work with Xinshi Chen and Hanjun Dai Bayesian In Inference Infer the

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Sequential Optimal Inference for Experiments with Bayesian Particle Filters Remi Daviet Wharton

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

EST5104 Bayesian Inference EST5803 Advanced Bayesian Inference Ricardo Ehlers ehlers@icmc.usp.br

Machine Learning: Foundations Lecturer: Yishay Mansour Lecture 2 Bayesian Inference Kfir Bar

Analytics, Inference and Computation in Cosmology: Exercises on Bayesian Inference Roberto

Approximate Bayesian inference for latent Gaussian models avard Rue 1 H Department of

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

Nave Bayes Classification Nickolai Riabov, Kenneth Tiong Brown University Fall 2013 Nickolai

CSC 411 Lecture 19: Bayesian Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan

2. Naive Bayes Classification Machine Learning and Real-world Data (MLRD) Paula Buttery (based

Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November

Bayesian inference: Principles and applications Roberto Trotta - www.robertotrotta.com

Bayesian method probabilities Application of Bayesian methods Demo: McRobot (P . Lewis)

Bayesian Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 6

Bioinformatics: Network Analysis Probabilistic Modeling: Bayesian Networks COMP 572 (BIOS 572 /

Sambuz

Useful Links

Newsletter

Mail Us