Bayesian inference and mathematical imaging. Part II: Markov chain - PowerPoint PPT Presentation

Bayesian inference and mathematical imaging. Part II: Markov chain Monte Carlo. Dr. Marcelo Pereyra http://www.macs.hw.ac.uk/ ∼ mp71/ Maxwell Institute for Mathematical Sciences, Heriot-Watt University January 2019, CIRM, Marseille. M. Pereyra (MI — HWU) Bayesian mathematical imaging 0 / 47

Outline Bayesian inference in imaging inverse problems 1 Proximal Markov chain Monte Carlo 2 Uncertainty quantification in astronomical and medical imaging 3 4 Image model selection and model calibration Conclusion 5 M. Pereyra (MI — HWU) Bayesian mathematical imaging 1 / 47

Imaging inverse problems We are interested in an unknown image x ∈ R d . We measure y , related to x by a statistical model p ( y ∣ x ) . The recovery of x from y is ill-posed or ill-conditioned, resulting in significant uncertainty about x . For example, in many imaging problems y = Ax + w , for some operator A that is rank-deficient, and additive noise w . M. Pereyra (MI — HWU) Bayesian mathematical imaging 2 / 47

The Bayesian framework We use priors to reduce uncertainty and deliver accurate results. Given the prior p ( x ) , the posterior distribution of x given y p ( x ∣ y ) = p ( y ∣ x ) p ( x )/ p ( y ) models our knowledge about x after observing y . In this talk we consider that p ( x ∣ y ) is log-concave; i.e., p ( x ∣ y ) = exp {− φ ( x )}/ Z , where φ ( x ) is a convex function and Z = ∫ exp {− φ ( x )} d x . M. Pereyra (MI — HWU) Bayesian mathematical imaging 3 / 47

Maximum-a-posteriori (MAP) estimation The predominant Bayesian approach in imaging is MAP estimation x MAP = argmax p ( x ∣ y ) , ˆ x ∈ R d = argmin φ ( x ) , (1) x ∈ R d computed efficiently, even in very high dimensions, by (proximal) convex optimisation (Chambolle and Pock, 2016). M. Pereyra (MI — HWU) Bayesian mathematical imaging 4 / 47

Illustrative example: astronomical image reconstruction Recover x ∈ R d from low-dimensional degraded observation y = M F x + w , where F is the continuous Fourier transform, M ∈ C m × d is a measurement operator and w is Gaussian noise. We use the model p ( x ∣ y ) ∝ exp (−∥ y − M F x ∥ 2 / 2 σ 2 − θ ∥ Ψ x ∥ 1 ) 1 R n + ( x ) . (2) ˆ x MAP y Figure : Radio-interferometric image reconstruction of the W28 supernova . M. Pereyra (MI — HWU) Bayesian mathematical imaging 5 / 47

MAP estimation by proximal optimisation To compute ˆ x MAP we use a proximal splitting algorithm. Let f ( x ) = ∥ y − M F x ∥ 2 / 2 σ 2 , g ( x ) = θ ∥ Ψ x ∥ 1 + − log 1 R n + ( x ) , and where f and g are l.s.c. convex on R d , and f is L f -Lipschitz differentiable. For example, we could use a proximal gradient iteration x m + 1 = prox g { x m + L − 1 f ∇ f ( x m )} , L − 1 f x MAP at rate O ( 1 / m ) , with poss. acceleration to O ( 1 / m 2 ) . converges to ˆ Definition For λ > 0, the λ -proximal operator of a convex l.s.c. function g is defined as (Moreau, 1962) g ( x ) ≜ argmin g ( u ) + 1 2 λ ∣∣ u − x ∣∣ 2 . prox λ u ∈ R N M. Pereyra (MI — HWU) Bayesian mathematical imaging 6 / 47

MAP estimation by proximal optimisation The alternating direction method of multipliers (ADMM) algorithm x m + 1 = prox λ f { z m − u m } , z m + 1 = prox λ g { x m + 1 + u m } , u m + 1 = u m + x m + 1 − z m + 1 , also converges to ˆ x MAP very quickly, and does not require f to be smooth. However, MAP estimation has some limitations, e.g., 1 it provides little information about p ( x ∣ y ) , 2 it struggles with unknown/partially unknown models, 3 it is not theoretically well understood (yet). M. Pereyra (MI — HWU) Bayesian mathematical imaging 7 / 47

Outline Bayesian inference in imaging inverse problems 1 Proximal Markov chain Monte Carlo 2 Uncertainty quantification in astronomical and medical imaging 3 4 Image model selection and model calibration Conclusion 5 M. Pereyra (MI — HWU) Bayesian mathematical imaging 8 / 47

Inference by Markov chain Monte Carlo integration Monte Carlo integration Given a set of samples X 1 ,..., X M distributed according to p ( x ∣ y ) , we approximate posterior expectations and probabilities M ∑ h ( X m ) → E { h ( x )∣ y } , as M → ∞ 1 M m = 1 Markov chain Monte Carlo: Construct a Markov kernel X m + 1 ∣ X m ∼ K ( ⋅ ∣ X m ) such that the Markov chain X 1 ,..., X M has p ( x ∣ y ) as stationary distribution. MCMC simulation in high-dimensional spaces is very challenging. M. Pereyra (MI — HWU) Bayesian mathematical imaging 9 / 47

Unadjusted Langevin algorithm Suppose for now that p ( x ∣ y ) ∈ C 1 . Then, we can generate samples by mimicking a Langevin diffusion process that converges to p ( x ∣ y ) as t → ∞ , d X t = 1 2 ∇ log p ( X t ∣ y ) d t + d W t , 0 ≤ t ≤ T , X ( 0 ) = x 0 . X ∶ where W is the n -dimensional Brownian motion. Because solving X t exactly is generally not possible, we use an Euler Maruyama approximation and obtain the “unadjusted Langevin algorithm” √ ULA ∶ X m + 1 = X m + δ ∇ log p ( X m ∣ y ) + 2 δ Z m + 1 , Z m + 1 ∼ N( 0 , I n ) ULA is remarkably efficient when p ( x ∣ y ) is sufficiently regular. M. Pereyra (MI — HWU) Bayesian mathematical imaging 10 / 47

Metropolis-adjusted Langevin algorithm ULA does not exactly target p ( x ∣ y ) because of the time-discrete approximation. In many problems this estimation bias is acceptable. This error can be removed by using a so-called Metropolis-Hastings correction. Given X m at iteration m , we perform 1 A ULA step: √ X ∗ = X m + δ ∇ log p ( X m ∣ y ) + Z m + 1 ∼ N( 0 , I n ) , 2 δ Z m + 1 , 2 With probability ρ ( X ∗ , X m ) we set X m + 1 = X ∗ , else set X m + 1 = X m , ρ ( X ∗ , X m ) = min [ 1 , p ( X ∗ ∣ y ) p ( X m ∣ X ∗ ) p ( X ∗ ∣ X m )] . p ( X m ∣ y ) M. Pereyra (MI — HWU) Bayesian mathematical imaging 11 / 47

Metropolis-adjusted Langevin algorithm Some observations: This correction removes the bias at the expense of additional variance. The efficiency of the method depends strongly on δ . The optimal efficiency is achieved for E ( ρ ) ≈ 0 . 6 as dimension d → ∞ . M. Pereyra (MI — HWU) Bayesian mathematical imaging 12 / 47

Metropolis-adjusted Langevin algorithm Some observations: This correction removes the bias at the expense of additional variance. The efficiency of the method depends strongly on δ . The optimal efficiency is achieved for E ( ρ ) ≈ 0 . 6 as dimension d → ∞ . 1 A ULA step: √ X ∗ = X m + δ m + 1 ∇ log p ( X m ∣ y ) + Z m + 1 ∼ N( 0 , I n ) , 2 δ m + 1 Z m + 1 , 2 With probability ρ ( X ∗ , X m ) we set X m + 1 = X ∗ , else set X m + 1 = X m , ρ ( X ∗ , X m ) = min [ 1 , p ( X ∗ ∣ y ) p ( X m ∣ X ∗ ) p ( X ∗ ∣ X m )] . p ( X m ∣ y ) 3 Update δ m + 2 = δ m + 1 + α m + 1 ( ρ ( X ∗ , X m ) − 0 . 6 ) , for some { α m } ∞ m = 1 . M. Pereyra (MI — HWU) Bayesian mathematical imaging 13 / 47

Non-smooth models Suppose that p ( x ∣ y ) ∝ exp {− f ( x ) − g ( x )} (3) where f ( x ) and g ( x ) are l.s.c. convex functions from R d → (−∞ , +∞] , f is L f -Lipschitz differentiable, and g ∉ C 1 . For example, f ( x ) = 2 σ 2 ∥ y − Ax ∥ 2 g ( x ) = α ∥ Bx ∥ † + 1 S ( x ) , 1 2 , for some linear operators A , B , norm ∥ ⋅ ∥ † , and convex set S . Unfortunately, such non-models are beyond the scope of ULA. Idea: Regularise p ( x ∣ y ) to enable efficiently Langevin sampling. M. Pereyra (MI — HWU) Bayesian mathematical imaging 14 / 47

Approximation of p ( x ∣ y ) Moreau-Yoshida approximation of p ( x ∣ y ) (Pereyra, 2015): Let λ > 0. We propose to approximate p ( x ∣ y ) with the density exp [ − f ( x ) − g λ ( x )] p λ ( x ∣ y ) = ∫ R d exp [ − f ( x ) − g λ ( x )] d x , where g λ is the Moreau-Yoshida envelope of g given by g λ ( x ) = inf u ∈ R d { g ( u ) + ( 2 λ ) − 1 ∥ u − x ∥ 2 2 } , and where λ controls the approximation error involved. M. Pereyra (MI — HWU) Bayesian mathematical imaging 15 / 47

Moreau-Yoshida approximations Key properties (Pereyra, 2015; Durmus et al., 2018): 1 ∀ λ > 0, p λ defines a proper density of a probability measure on R d . 2 Convexity and differentiability : p λ is log-concave on R d . p λ ∈ C 1 even if p not differentiable, with ∇ log p λ ( x ∣ y ) = −∇ f ( x ) + { prox λ g ( x ) − x }/ λ, g ( x ) = argmin u ∈ R N g ( u ) + 1 2 λ ∣∣ u − x ∣∣ 2 . and prox λ ∇ log p λ is Lipchitz continuous with constant L ≤ L f + λ − 1 . 3 Approximation error between p λ ( x ∣ y ) and p ( x ∣ y ) : lim λ → 0 ∥ p λ − p ∥ TV = 0. If g is L g -Lipchitz, then ∥ p λ − p ∥ TV ≤ λ L 2 g . M. Pereyra (MI — HWU) Bayesian mathematical imaging 16 / 47

Illustration Examples of Moreau-Yoshida approximations: p ( x ) ∝ exp (− x 4 ) p ( x ) ∝ 1 [− 0 . 5 , 0 . 5 ] ( x ) p ( x ) ∝ exp (−∣ x ∣) Figure : True densities (solid blue) and approximations (dashed red). M. Pereyra (MI — HWU) Bayesian mathematical imaging 17 / 47

Bayesian inference and mathematical imaging. Part II: Markov chain - PowerPoint PPT Presentation

Bayesian inference and mathematical imaging. Part II: Markov chain Monte Carlo. Dr. Marcelo Pereyra http://www.macs.hw.ac.uk/ mp71/ Maxwell Institute for Mathematical Sciences, Heriot-Watt University January 2019, CIRM, Marseille. M.

Nuclear Imaging Medical Imaging Medical Imaging Nuclear Imaging Nuclear Imaging Nuclear

Bayesian inference and mathematical imaging. Part I: Bayesian analysis and decision theory. Dr.

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian inference and mathematical imaging. Part IV: mixture, random fields, and hierarchical

Bayesian inference and mathematical imaging. Part III: probability & convex optimisation. Dr.

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Approximate Bayesian inference for latent Gaussian models avard Rue 1 H Department of

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Analytics, Inference and Computation in Cosmology: Exercises on Bayesian Inference Roberto

EST5104 Bayesian Inference EST5803 Advanced Bayesian Inference Ricardo Ehlers ehlers@icmc.usp.br

Machine Learning: Foundations Lecturer: Yishay Mansour Lecture 2 Bayesian Inference Kfir Bar

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of Statistical Mathematics, Japan

Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 89 in

Markov Chains and MCMC CompSci 590.04 Instructor:

Markov Chains DS GA 1002 Probability and Statistics for Data Science

Pulsar Detection and Parameter Estimation with MCMC - Six Parameters Nelson Christensen Physics

Overview of Markov chains David Gleich Purdue University

18.175: Lecture 31 More Markov chains Scott Sheffield MIT 1 18.175 Lecture 31 Outline

Markov chains and the number of occurrences of a word in a sequence (4.54.9, 11.1,2,4,6)

Bayesian inference and mathematical imaging. Part II: Markov chain - PowerPoint PPT Presentation

Bayesian inference and mathematical imaging. Part II: Markov chain Monte Carlo. Dr. Marcelo Pereyra http://www.macs.hw.ac.uk/ mp71/ Maxwell Institute for Mathematical Sciences, Heriot-Watt University January 2019, CIRM, Marseille. M.

Nuclear Imaging Medical Imaging Medical Imaging Nuclear Imaging Nuclear Imaging Nuclear

Bayesian inference and mathematical imaging. Part I: Bayesian analysis and decision theory. Dr.

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian inference and mathematical imaging. Part IV: mixture, random fields, and hierarchical

Bayesian inference and mathematical imaging. Part III: probability &amp; convex optimisation. Dr.

Basics of Bayesian Inference A frequentist thinks of unknown parameters as fixed Basics of

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Approximate Bayesian inference for latent Gaussian models avard Rue 1 H Department of

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Meta-Bayesian Analysis A Bayesian decision-theoretic analysis of Bayesian inference under model

Analytics, Inference and Computation in Cosmology: Exercises on Bayesian Inference Roberto

EST5104 Bayesian Inference EST5803 Advanced Bayesian Inference Ricardo Ehlers ehlers@icmc.usp.br

Machine Learning: Foundations Lecturer: Yishay Mansour Lecture 2 Bayesian Inference Kfir Bar

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of Statistical Mathematics, Japan

Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 89 in

Markov Chains and MCMC CompSci 590.04 Instructor:

Markov Chains DS GA 1002 Probability and Statistics for Data Science

Pulsar Detection and Parameter Estimation with MCMC - Six Parameters Nelson Christensen Physics

Overview of Markov chains David Gleich Purdue University

18.175: Lecture 31 More Markov chains Scott Sheffield MIT 1 18.175 Lecture 31 Outline

Markov chains and the number of occurrences of a word in a sequence (4.54.9, 11.1,2,4,6)

Bayesian inference and mathematical imaging. Part III: probability & convex optimisation. Dr.