Bayesian inference and mathematical imaging. Part I: Bayesian analysis and decision theory. Dr. Marcelo Pereyra http://www.macs.hw.ac.uk/ ∼ mp71/ Maxwell Institute for Mathematical Sciences, Heriot-Watt University January 2019, CIRM, Marseille. M. Pereyra (MI — HWU) Bayesian mathematical imaging 0 / 42
Outline Imaging inverse problems 1 Bayesian modelling 2 3 Bayesian inference Conclusion 4 M. Pereyra (MI — HWU) Bayesian mathematical imaging 1 / 42
Imaging inverse problems We are interested in an unknown image x ∈ R d . We measure y , related to x by some mathematical model. For example, in many imaging problems y = Ax + w , for some linear operator A , and additive noise w . M. Pereyra (MI — HWU) Bayesian mathematical imaging 2 / 42
Regularisation If A ⊺ A is ill-conditioned or rank deficient, then y does not allow reliably recovering x without additional information. In words, there is significant uncertainty about x given y . To reduce uncertainty and deliver meaningful results we need to regularise the estimation problem y → x to make it well-posed. M. Pereyra (MI — HWU) Bayesian mathematical imaging 3 / 42
Regularisation For example, given some penalty function h ∶ R d → [ 0 , ∞ ) promoting expected properties of x , we could construct the estimator ∥ y − Ax ∥ 2 2 + θ h ( x ) , ˆ x = argmin (1) x ∈ R d that combines the data fidelity term ∥ y − Ax ∥ 2 2 and the penalty h , where the “regularisation” parameter θ > 0 controls the balance. When h is convex and l.s.c., ˆ x can be computed efficiently by (proximal) convex optimisation (Chambolle and Pock, 2016). Other data fidelity terms could be considered too (e.g., ∥ y − Ax ∥ 1 ). M. Pereyra (MI — HWU) Bayesian mathematical imaging 4 / 42
Illustrative example: astronomical image reconstruction Recover x ∈ R d from low-dimensional observation y = M F x + w , where F is the cont. Fourier transform, M ∈ C m × d is a mask. We use the estimator ∥ y − M F x ∥ 2 2 + θ ∥ Ψ x ∥ 1 , ˆ x = argmin (2) x ∈ R + where Ψ is a specialised wavelet dictionary and θ > 0 is some parameter. ∣ y ∣ ˆ x Figure : Radio-interferometric image reconstruction of the W28 supernova . M. Pereyra (MI — HWU) Bayesian mathematical imaging 5 / 42
Illustrative example: astronomical image reconstruction Modern convex optimisation can compute ˆ x very efficiently... With parallelised and distributed algorithms... With theoretical convergence guarantees.. And GPU implementations... Also non-convex extensions... Also, if we had abundant training data (we do not) we could learn a neural network to recover x from y . Or alternatively learn y → ˆ x for efficiency. So the problem is quite solved, right? M. Pereyra (MI — HWU) Bayesian mathematical imaging 6 / 42
Not really... M. Pereyra (MI — HWU) Bayesian mathematical imaging 7 / 42
Elephant 1: what is the uncertainty about ˆ x ? ˆ x MAP ∣ y ∣ How confident are we about all these structures in the image? What is the error in their intensity, position, spectral properties? Using ˆ x MAP to derive physical quantities? what error bars should we put... M. Pereyra (MI — HWU) Bayesian mathematical imaging 8 / 42
Illustrative example: magnetic resonance imaging We use very similar techniques to produce magnetic resonance images... ˆ x (zoom) ˆ x Figure : Magnetic resonance imaging of brain lession. How can we quantify our uncertainty about the brain lesion in the image? M. Pereyra (MI — HWU) Bayesian mathematical imaging 9 / 42
Illustrative example: magnetic resonance imaging What about this other solution to the problem, with no lesion? x ′ (zoom) ˆ x ′ ˆ Figure : Magnetic resonance imaging of brain lession. Do we have any arguments to reject this solution? M. Pereyra (MI — HWU) Bayesian mathematical imaging 10 / 42
Elephant 1: what is the uncertainty about ˆ x ? Another example related to sparse super-resolution in live-cell microscopy ˆ ˆ x MAP (zoom) y x MAP Figure : Live-cell microscopy data (Zhu et al., 2012). The image is sharpened to enhance molecule position measurements, but what is the precision of the procedure? M. Pereyra (MI — HWU) Bayesian mathematical imaging 11 / 42
Elephant 2: multiple and partially unknown models We usually have several alternative models/cost functions to recover x ∥ y − A 1 x ∥ 2 2 + θ 1 h 1 ( x ) , x 1 = argmin ˆ x ∈ R d (3) ∥ y − A 2 x ∥ 2 2 + θ 1 h 2 ( x ) , x 2 = argmin ˆ x ∈ R d How can we compare them without ground truth available? Can we use several models simultaneously? M. Pereyra (MI — HWU) Bayesian mathematical imaging 12 / 42
Elephant 2: multiple and partially unknown models Some of the model parameters might also be unknown; e.g., θ ∈ R + in ∥ y − A 1 x ∥ 2 2 + θ h 1 ( x ) . ˆ x 1 = argmin (4) x ∈ R d Then θ parametrises a class of models for y → x . How can we select θ without using ground truth? Could we use all models simultaneously? M. Pereyra (MI — HWU) Bayesian mathematical imaging 13 / 42
Other elephants in the neighbourhood... Why do we formulate solutions to imaging problems as penalised data-fitting problems? Are there other relevant formulations? Suppose we had a specific meaningful way of measuring estimation error, reflecting specific aspects of our problem or the application considered. What would be the optimal estimator then? M. Pereyra (MI — HWU) Bayesian mathematical imaging 14 / 42
Other elephants in the neighbourhood... Should solutions to imaging problems always be images? even when their purpose is to inform decision making, scientific conclusions? What other mathematical objects could we construct to represent solutions to imaging problems? e.g., could curves (videos) be interesting solutions? M. Pereyra (MI — HWU) Bayesian mathematical imaging 15 / 42
Outline Imaging inverse problems 1 Bayesian modelling 2 3 Bayesian inference Conclusion 4 M. Pereyra (MI — HWU) Bayesian mathematical imaging 16 / 42
Introduction Bayesian statistics is a mathematical framework for deriving inferences about x , from some observed data y and prior knowledge available. Adopting a subjective probability, we represent x as a random quantity, and model our knowledge about x by using probability distributions. To derive inferences about x from y we postulate a joint statistical model p ( x , y ) ; typically specified via the decomposition p ( x , y ) = p ( y ∣ x ) p ( x ) . M. Pereyra (MI — HWU) Bayesian mathematical imaging 17 / 42
Introduction The decomposition p ( x , y ) = p ( y ∣ x ) p ( x ) has two key ingredients: The likelihood function: the conditional distribution p ( y ∣ x ) that models the data observation process (forward model). The prior function: the marginal distribution p ( x ) = ∫ p ( x , y ) d x that models our knowledge about x “before observing y ”. For example, for y = Ax + w , with w ∼ N ( 0 ,σ 2 I ) , we have y ∼ N ( Ax ,σ 2 I ) , or equivalently p ( y ∣ x ) ∝ exp { − ∥ y − Ax ∥ 2 2 / 2 σ 2 } . M. Pereyra (MI — HWU) Bayesian mathematical imaging 18 / 42
Prior distribution The prior distribution is often of the form: p ( x ) ∝ e − θ h ( x ) 1 Ω ( x ) , for some h ∶ R d → R m , θ ∈ R m , and constraint set Ω. This prior is essentially specifying the expectation E ( h ) = ∫ Ω h ( x ) p ( x ) d x = ∇ θ log C ( θ ) where C ( θ ) = ∫ Ω e − θ h ( x ) d x . M. Pereyra (MI — HWU) Bayesian mathematical imaging 19 / 42
Prior distribution For example, priors of the form p ( x ) ∝ e − θ ∥ Ψ x ∥ † , for some dictionary Ψ ∈ R d × p and norm ∥ ⋅ ∥ † , are encoding E (∥ Ψ x ∥ † ) = d θ . See Pereyra et al. (2015) for more details and other examples. M. Pereyra (MI — HWU) Bayesian mathematical imaging 20 / 42
Posterior distribution Given an observation y , we naturally base our inferences on the posterior distribution p ( x ∣ y ) . We derive p ( x ∣ y ) from the likelihood p ( y ∣ x ) and the prior p ( x ) by using p ( x ∣ y ) = p ( y ∣ x ) p ( x ) p ( y ) where p ( y ) = ∫ p ( y ∣ x ) p ( x ) d x measures model-fit-to-data. The conditional p ( x ∣ y ) models our knowledge about x after observing y . Observe that p ( x ) may significantly regularise the estimation problem and address identifiability issues in p ( y ∣ x ) (e.g., when A is rank deficient). M. Pereyra (MI — HWU) Bayesian mathematical imaging 21 / 42
Maximum-a-posteriori (MAP) estimation The predominant Bayesian approach in imaging to extract a solution from p ( x ∣ y ) is MAP estimation x MAP = argmax ˆ p ( x ∣ y ) , x ∈ R d (5) − log p ( y ∣ x ) − log p ( x ) + log p ( y ) . = argmin x ∈ R d When p ( x ∣ y ) is log-concave, then ˆ x MAP is a convex optimisation problem and can be efficiently solved (Chambolle and Pock, 2016). M. Pereyra (MI — HWU) Bayesian mathematical imaging 22 / 42
Illustrative example: astronomical image reconstruction Recover x ∈ R d from low-dimensional degraded observation y = M F x + w , where F is the continuous Fourier transform, M ∈ C m × d is a measurement operator and w is Gaussian noise. We use the model p ( x ∣ y ) ∝ exp (−∥ y − M F x ∥ 2 / 2 σ 2 − θ ∥ Ψ x ∥ 1 ) 1 R n + ( x ) . (6) y ˆ x MAP Figure : Radio-interferometric image reconstruction of the W28 supernova . M. Pereyra (MI — HWU) Bayesian mathematical imaging 23 / 42
Recommend
More recommend