introduction to bayesian inversion mcmc
play

Introduction to Bayesian inversion & MCMC Ville Kolehmainen 1 1 - PowerPoint PPT Presentation

Introduction to Bayesian inversion & MCMC Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland CoE Summer School, Helsinki, June 2019 . . . . . . . . . . . . . . . . . . . . .


  1. Introduction to Bayesian inversion & MCMC Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland CoE Summer School, Helsinki, June 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  2. Contents Introduction Bayesian inversion Gaussian case About MAP estimate MCMC Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  3. What are inverse problems? ◮ Consider measurement problems; estimate the quantity of interest x ∈ R n from (noisy) measurement of A ( x ) ∈ R d , where A is known mapping. ◮ Inverse problems are characterized as those measurement problems which are ill-posed : 1. The problem is non-unique (e.g., less measurements than unknowns) 2. Solution is unstable w.r.t measurement noise and modelling errors (e.g., model reduction, inaccurately known nuisance parameters in the model A ( x ) ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  4. An introductory example: ◮ Consider 2D deconvolution (image deblurring); Given noisy and blurred image m ∈ R n m = Ax + e , the objective is to reconstruct the original image x ∈ R n . ◮ Forward model x �→ Ax implements discrete convolution (here the convolution kernel is Gaussian blurring kernel with std of 6 pixels). 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; Original image x . Right; Observed image m = Ax + e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  5. Example (cont.) ◮ Matrix A has trivial null-space null ( A ) = { 0 } ⇒ solution is unique (i.e. ∃ ( A T A ) − 1 ). ◮ However, the problem is unstable ( A has ”unclear” null-space, i.e., ∥ Aw ∥ = ∥ ∑ k σ k ⟨ w , v k ⟩ u k ∥ ≃ 0 for certain ∥ w ∥ = 1) ◮ Figure shows the least squares (LS) solution ∥ m − Ax ∥ 2 ⇒ x LS = ( A T A ) − 1 A T m x LS = arg min x 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; true image x . Right; LS solution x LS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  6. Regularization. ◮ The ill posed problem is replaced by a well posed approximation. Solution “hopefully” close to the true solution. ◮ Typically modifications of the associated LS problem ∥ m − Ax ∥ 2 . ◮ Examples of methods; Tikhonov regularization, truncated iterations. ◮ Consider the LS problem x {∥ m − Ax ∥ 2 ⇒ ( A T A ) x LS = A T m x LS = arg min 2 } � �� � B Uniqueness; ∃ B − 1 if null ( A ) = { 0 } . Stability of the solution? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  7. Regularization ◮ Example (cont.); In Tikhonov regularization, the LS problem is replaced with x {∥ m − Ax ∥ 2 2 + α ∥ x ∥ 2 ⇒ ( A T A + α I ) x TIK = A T m 2 } x TIK = arg min � �� � ˜ B Uniqueness; α > 0 ⇒ null (˜ B ) = { 0 } ⇒ ∃ ˜ B − 1 . Stability of the solution guaranteed by choosing α “large enough”. ◮ Regularization poses (implicit) prior about the solution. These assumptions are sometimes well hidden. 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; true image x . Right; Tikhonov solution x TIK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  8. Statistical (Bayesian) inversion. ◮ The inverse problem is recast as a problem of Bayesian inference. The key idea is to extract estimates and assess uncertainty about the unknowns based on ◮ Measured data ◮ Model of measurement process ◮ Model of a priori information ◮ Ill-posedness removed by explicit use of prior models! ◮ Systematic handling of model uncertainties and reduction ( ⇒ approximation error theory) 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; true image x . Right; Bayesian estimate x CM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  9. Bayesian inversion. ◮ All variables in the model are considered as random variables. The randomness reflects our uncertainty of their actual values. ◮ The degree of uncertainty is coded in the probability distribution models of these variables. ◮ The complete model of the inverse problem is the posterior probability distribution π ( x | m ) = π pr ( x ) π ( m | x ) , m = m observed (1) π ( m ) where ◮ π ( m | x ) is the likelihood density; model of the measurement process. ◮ π pr ( x ) is the prior density; model for a priori information. ◮ π ( m ) is a normalization constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  10. ◮ The posterior density model is a function on n dimensional space; π ( x | m ) : R n �→ R + where n is usually large. ◮ To obtain ”practical solution”, the posterior model is summarized by estimates that answer to questions such as; ◮ “What is the most probable value of x over π ( x | m ) ? ” ◮ “What is the mean of x over π ( x | m ) ? ” ◮ “In what interval are the values of x with 90% (posterior) probability? ” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  11. Point estimates ◮ Maximum a posteriori (MAP) estimate: π ( x MAP | m ) = arg max x ∈ R n π ( x | m ) . ◮ Conditional mean (CM) estimate: ∫ x CM = R n x π ( x | m ) d x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  12. Maximum a posteriori (MAP) estimate: x ∈ R n π ( x | m ) arg max Conditional mean (CM) estimate: ∫ R n x π ( x | m ) dx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  13. Spread estimates ◮ Covariance: ∫ R n ( x − x CM )( x − x CM ) T π ( x | m ) d x Γ x | m = ◮ Confidence intervals; Given 0 < τ < 100, compute a k and b k s.t. ∫ a k ∫ ∞ π k ( x k ) d x k = 100 − τ π k ( x k ) d x k = 200 −∞ b k where π k ( x k ) is the marginal density ∫ π k ( x k ) = R n − 1 π ( x | m ) d x 1 · · · d x k − 1 d x k + 1 · · · d x n . The interval I k ( τ ) = [ a k b k ] contains τ % of the mass of the marginal density. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  14. Illustration of confidence intervals ◮ x = ( x 1 , x 2 ) T ∈ R 2 ∫ ◮ π ( x 1 | m ) = π ( x | m ) d x 2 (middle) ∫ ◮ π ( x 2 | m ) = π ( x | m ) d x 1 (right) ◮ Solid vertical lines show the mean value x CM , dotted the 90% confidence limits. E(x 1 ) = −0.0094962, E(x 2 ) = 0.3537 0.9 2 0.8 0.5 1.5 0.7 1 0.4 0.6 0.5 0.5 0.3 0 0.4 −0.5 0.2 0.3 −1 0.2 0.1 −1.5 0.1 −2 0 0 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 −1 −0.5 0 0.5 1 1.5 −0.5 0 0.5 1 1.5 Left; Contours of π ( x | m ) . x CM is shown with + . Middle; marginal density π ( x 1 | m ) . Right; marginal density π ( x 2 | m ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  15. Gaussian π ( x | m ) : ◮ Let m = Ax + e , e ∼ N ( e ∗ , Γ e ) independent of x and π pr ( x ) = ( x ∗ , Γ x ) → : π ( x | m ) ∝ π ( m | x ) π pr ( x ) ( − 1 2 ( m − Ax − e ∗ ) T Γ − 1 = exp e ( m − Ax − e ∗ ) ) 1 2 ( x − x ∗ ) T Γ − 1 − x ( x − x ∗ ) ◮ π ( x | m ) fully determined by mean and covariance: Γ x | m ( A T Γ − 1 e ( m − e ∗ ) + Γ − 1 x CM = x x ∗ ) ( A T Γ − 1 e A + Γ − 1 x ) − 1 Γ x | m = . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  16. ◮ Let L T e L e = Γ − 1 L T x L x = Γ − 1 e , x , then we can write π ( x | m ) ∝ exp {− 1 2 F ( x ) } , where F ( x ) = ∥ L e ( m − Ax − e ∗ ) ∥ 2 2 + ∥ L x ( x − x ∗ ) ∥ 2 2 and x MAP = arg min F ( x ) x ⇒ Connection to Tikhonov regularization! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  17. Example ◮ Consider the original form of Tikhonov regularization; x {∥ m − Ax ∥ 2 2 + α ∥ x ∥ 2 ⇒ x TIK = ( A T A + α I ) − 1 A T m x TIK = arg min 2 } ◮ From the Bayesian viewpoint, x TIK correspond to x CM with the following assumptions ◮ Measurement model m = Ax + e , x and e mutually independent with e ∼ N ( 0 , I ) . ◮ x is assumed a priori mutually independent zero mean white noise with variance 1 /α . ◮ The original idea of the Tikhonov method was to approximate A T A with a matrix A T A + α I that is invertible and produces stable solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Recommend


More recommend