bayesian inference and mathematical imaging part ii
play

Bayesian inference and mathematical imaging. Part II: Markov chain - PowerPoint PPT Presentation

Bayesian inference and mathematical imaging. Part II: Markov chain Monte Carlo. Dr. Marcelo Pereyra http://www.macs.hw.ac.uk/ mp71/ Maxwell Institute for Mathematical Sciences, Heriot-Watt University January 2019, CIRM, Marseille. M.


  1. Bayesian inference and mathematical imaging. Part II: Markov chain Monte Carlo. Dr. Marcelo Pereyra http://www.macs.hw.ac.uk/ ∼ mp71/ Maxwell Institute for Mathematical Sciences, Heriot-Watt University January 2019, CIRM, Marseille. M. Pereyra (MI — HWU) Bayesian mathematical imaging 0 / 47

  2. Outline Bayesian inference in imaging inverse problems 1 Proximal Markov chain Monte Carlo 2 Uncertainty quantification in astronomical and medical imaging 3 4 Image model selection and model calibration Conclusion 5 M. Pereyra (MI — HWU) Bayesian mathematical imaging 1 / 47

  3. Imaging inverse problems We are interested in an unknown image x ∈ R d . We measure y , related to x by a statistical model p ( y ∣ x ) . The recovery of x from y is ill-posed or ill-conditioned, resulting in significant uncertainty about x . For example, in many imaging problems y = Ax + w , for some operator A that is rank-deficient, and additive noise w . M. Pereyra (MI — HWU) Bayesian mathematical imaging 2 / 47

  4. The Bayesian framework We use priors to reduce uncertainty and deliver accurate results. Given the prior p ( x ) , the posterior distribution of x given y p ( x ∣ y ) = p ( y ∣ x ) p ( x )/ p ( y ) models our knowledge about x after observing y . In this talk we consider that p ( x ∣ y ) is log-concave; i.e., p ( x ∣ y ) = exp {− φ ( x )}/ Z , where φ ( x ) is a convex function and Z = ∫ exp {− φ ( x )} d x . M. Pereyra (MI — HWU) Bayesian mathematical imaging 3 / 47

  5. Maximum-a-posteriori (MAP) estimation The predominant Bayesian approach in imaging is MAP estimation x MAP = argmax p ( x ∣ y ) , ˆ x ∈ R d = argmin φ ( x ) , (1) x ∈ R d computed efficiently, even in very high dimensions, by (proximal) convex optimisation (Chambolle and Pock, 2016). M. Pereyra (MI — HWU) Bayesian mathematical imaging 4 / 47

  6. Illustrative example: astronomical image reconstruction Recover x ∈ R d from low-dimensional degraded observation y = M F x + w , where F is the continuous Fourier transform, M ∈ C m × d is a measurement operator and w is Gaussian noise. We use the model p ( x ∣ y ) ∝ exp (−∥ y − M F x ∥ 2 / 2 σ 2 − θ ∥ Ψ x ∥ 1 ) 1 R n + ( x ) . (2) ˆ x MAP y Figure : Radio-interferometric image reconstruction of the W28 supernova . M. Pereyra (MI — HWU) Bayesian mathematical imaging 5 / 47

  7. MAP estimation by proximal optimisation To compute ˆ x MAP we use a proximal splitting algorithm. Let f ( x ) = ∥ y − M F x ∥ 2 / 2 σ 2 , g ( x ) = θ ∥ Ψ x ∥ 1 + − log 1 R n + ( x ) , and where f and g are l.s.c. convex on R d , and f is L f -Lipschitz differentiable. For example, we could use a proximal gradient iteration x m + 1 = prox g { x m + L − 1 f ∇ f ( x m )} , L − 1 f x MAP at rate O ( 1 / m ) , with poss. acceleration to O ( 1 / m 2 ) . converges to ˆ Definition For λ > 0, the λ -proximal operator of a convex l.s.c. function g is defined as (Moreau, 1962) g ( x ) ≜ argmin g ( u ) + 1 2 λ ∣∣ u − x ∣∣ 2 . prox λ u ∈ R N M. Pereyra (MI — HWU) Bayesian mathematical imaging 6 / 47

  8. MAP estimation by proximal optimisation The alternating direction method of multipliers (ADMM) algorithm x m + 1 = prox λ f { z m − u m } , z m + 1 = prox λ g { x m + 1 + u m } , u m + 1 = u m + x m + 1 − z m + 1 , also converges to ˆ x MAP very quickly, and does not require f to be smooth. However, MAP estimation has some limitations, e.g., 1 it provides little information about p ( x ∣ y ) , 2 it struggles with unknown/partially unknown models, 3 it is not theoretically well understood (yet). M. Pereyra (MI — HWU) Bayesian mathematical imaging 7 / 47

  9. Outline Bayesian inference in imaging inverse problems 1 Proximal Markov chain Monte Carlo 2 Uncertainty quantification in astronomical and medical imaging 3 4 Image model selection and model calibration Conclusion 5 M. Pereyra (MI — HWU) Bayesian mathematical imaging 8 / 47

  10. Inference by Markov chain Monte Carlo integration Monte Carlo integration Given a set of samples X 1 ,..., X M distributed according to p ( x ∣ y ) , we approximate posterior expectations and probabilities M ∑ h ( X m ) → E { h ( x )∣ y } , as M → ∞ 1 M m = 1 Markov chain Monte Carlo: Construct a Markov kernel X m + 1 ∣ X m ∼ K ( ⋅ ∣ X m ) such that the Markov chain X 1 ,..., X M has p ( x ∣ y ) as stationary distribution. MCMC simulation in high-dimensional spaces is very challenging. M. Pereyra (MI — HWU) Bayesian mathematical imaging 9 / 47

  11. Unadjusted Langevin algorithm Suppose for now that p ( x ∣ y ) ∈ C 1 . Then, we can generate samples by mimicking a Langevin diffusion process that converges to p ( x ∣ y ) as t → ∞ , d X t = 1 2 ∇ log p ( X t ∣ y ) d t + d W t , 0 ≤ t ≤ T , X ( 0 ) = x 0 . X ∶ where W is the n -dimensional Brownian motion. Because solving X t exactly is generally not possible, we use an Euler Maruyama approximation and obtain the “unadjusted Langevin algorithm” √ ULA ∶ X m + 1 = X m + δ ∇ log p ( X m ∣ y ) + 2 δ Z m + 1 , Z m + 1 ∼ N( 0 , I n ) ULA is remarkably efficient when p ( x ∣ y ) is sufficiently regular. M. Pereyra (MI — HWU) Bayesian mathematical imaging 10 / 47

  12. Metropolis-adjusted Langevin algorithm ULA does not exactly target p ( x ∣ y ) because of the time-discrete approximation. In many problems this estimation bias is acceptable. This error can be removed by using a so-called Metropolis-Hastings correction. Given X m at iteration m , we perform 1 A ULA step: √ X ∗ = X m + δ ∇ log p ( X m ∣ y ) + Z m + 1 ∼ N( 0 , I n ) , 2 δ Z m + 1 , 2 With probability ρ ( X ∗ , X m ) we set X m + 1 = X ∗ , else set X m + 1 = X m , ρ ( X ∗ , X m ) = min [ 1 , p ( X ∗ ∣ y ) p ( X m ∣ X ∗ ) p ( X ∗ ∣ X m )] . p ( X m ∣ y ) M. Pereyra (MI — HWU) Bayesian mathematical imaging 11 / 47

  13. Metropolis-adjusted Langevin algorithm Some observations: This correction removes the bias at the expense of additional variance. The efficiency of the method depends strongly on δ . The optimal efficiency is achieved for E ( ρ ) ≈ 0 . 6 as dimension d → ∞ . M. Pereyra (MI — HWU) Bayesian mathematical imaging 12 / 47

  14. Metropolis-adjusted Langevin algorithm Some observations: This correction removes the bias at the expense of additional variance. The efficiency of the method depends strongly on δ . The optimal efficiency is achieved for E ( ρ ) ≈ 0 . 6 as dimension d → ∞ . 1 A ULA step: √ X ∗ = X m + δ m + 1 ∇ log p ( X m ∣ y ) + Z m + 1 ∼ N( 0 , I n ) , 2 δ m + 1 Z m + 1 , 2 With probability ρ ( X ∗ , X m ) we set X m + 1 = X ∗ , else set X m + 1 = X m , ρ ( X ∗ , X m ) = min [ 1 , p ( X ∗ ∣ y ) p ( X m ∣ X ∗ ) p ( X ∗ ∣ X m )] . p ( X m ∣ y ) 3 Update δ m + 2 = δ m + 1 + α m + 1 ( ρ ( X ∗ , X m ) − 0 . 6 ) , for some { α m } ∞ m = 1 . M. Pereyra (MI — HWU) Bayesian mathematical imaging 13 / 47

  15. Non-smooth models Suppose that p ( x ∣ y ) ∝ exp {− f ( x ) − g ( x )} (3) where f ( x ) and g ( x ) are l.s.c. convex functions from R d → (−∞ , +∞] , f is L f -Lipschitz differentiable, and g ∉ C 1 . For example, f ( x ) = 2 σ 2 ∥ y − Ax ∥ 2 g ( x ) = α ∥ Bx ∥ † + 1 S ( x ) , 1 2 , for some linear operators A , B , norm ∥ ⋅ ∥ † , and convex set S . Unfortunately, such non-models are beyond the scope of ULA. Idea: Regularise p ( x ∣ y ) to enable efficiently Langevin sampling. M. Pereyra (MI — HWU) Bayesian mathematical imaging 14 / 47

  16. Approximation of p ( x ∣ y ) Moreau-Yoshida approximation of p ( x ∣ y ) (Pereyra, 2015): Let λ > 0. We propose to approximate p ( x ∣ y ) with the density exp [ − f ( x ) − g λ ( x )] p λ ( x ∣ y ) = ∫ R d exp [ − f ( x ) − g λ ( x )] d x , where g λ is the Moreau-Yoshida envelope of g given by g λ ( x ) = inf u ∈ R d { g ( u ) + ( 2 λ ) − 1 ∥ u − x ∥ 2 2 } , and where λ controls the approximation error involved. M. Pereyra (MI — HWU) Bayesian mathematical imaging 15 / 47

  17. Moreau-Yoshida approximations Key properties (Pereyra, 2015; Durmus et al., 2018): 1 ∀ λ > 0, p λ defines a proper density of a probability measure on R d . 2 Convexity and differentiability : p λ is log-concave on R d . p λ ∈ C 1 even if p not differentiable, with ∇ log p λ ( x ∣ y ) = −∇ f ( x ) + { prox λ g ( x ) − x }/ λ, g ( x ) = argmin u ∈ R N g ( u ) + 1 2 λ ∣∣ u − x ∣∣ 2 . and prox λ ∇ log p λ is Lipchitz continuous with constant L ≤ L f + λ − 1 . 3 Approximation error between p λ ( x ∣ y ) and p ( x ∣ y ) : lim λ → 0 ∥ p λ − p ∥ TV = 0. If g is L g -Lipchitz, then ∥ p λ − p ∥ TV ≤ λ L 2 g . M. Pereyra (MI — HWU) Bayesian mathematical imaging 16 / 47

  18. Illustration Examples of Moreau-Yoshida approximations: p ( x ) ∝ exp (− x 4 ) p ( x ) ∝ 1 [− 0 . 5 , 0 . 5 ] ( x ) p ( x ) ∝ exp (−∣ x ∣) Figure : True densities (solid blue) and approximations (dashed red). M. Pereyra (MI — HWU) Bayesian mathematical imaging 17 / 47

Recommend


More recommend