bayesian inverse problems and uncertainty quantification
play

Bayesian Inverse Problems and Uncertainty Quantification Hanne - PowerPoint PPT Presentation

Bayesian Inverse Problems and Uncertainty Quantification Hanne Kekkonen Centre for Mathematical Sciences University of Cambridge June 4, 2019 Inverse problems arise naturally from applications 1 / 27 Inverse problems are ill-posed We want to


  1. Bayesian Inverse Problems and Uncertainty Quantification Hanne Kekkonen Centre for Mathematical Sciences University of Cambridge June 4, 2019

  2. Inverse problems arise naturally from applications 1 / 27

  3. Inverse problems are ill-posed We want to recover the unknown u from a noisy measurement m ; m = Au + noise , where A is a forward operator that usually causes loss of information. 2 / 27

  4. Inverse problems are ill-posed We want to recover the unknown u from a noisy measurement m ; m = Au + noise , where A is a forward operator that usually causes loss of information. Well-posedness as defined by Jacques Hadamard: 1. Existence: There exists at least one solution. 2. Uniqueness: There is at most one solution. 3. Stability: The solution depends continuously on data. Inverse problems are ill-posed breaking at least one of the above conditions. 2 / 27

  5. The naive inversion does not produce stable solutions We want to approximate u from a measurement m = Au + n , where A : X → Y is linear and n is noise. One approach is to use the least squares method � � � Au − m � 2 u = arg min . � Y u ∈ X Problem: Multiple minima and sensitive dependence on the data m . 3 / 27

  6. Tikhonov regularisation is a classical method for solving ill-posed problems We want to approximate u from a measurement m = Au + n , where A : X → Y is linear and n is noise. The problem is ill-posed so we add a regularising term and get � � � Au − m � 2 Y + α � u � 2 � u = arg min E u ∈ E ⊂ X Regularisation gives a stable approximate solution for the inverse problem. 4 / 27

  7. Bayes formula combines data and a priori information We want to reconstruct the most probable u ∈ R k in light of Measurement information: M | u ∼ P u with Lebesgue density ρ ( m | u ) = ρ ε ( m − Au ) . A priori information: U ∼ Π pr with Lebesgue density π pr ( u ) . Bayes’ formula We can update the prior, given a measurement, to a posterior distribution using the Bayes’ formula: π ( u | m ) ∝ π pr ( u ) ρ ( m | u ) The result of Bayesian inversion is the posterior distribution π ( u | m ) . 5 / 27

  8. The result of Bayesian inversion is the posterior distribution, but typically one looks at estimates Maximum a posteriori (MAP) estimate: arg max u ∈ R n π ( u | m ) Conditional mean (CM) estimate: � R n u π ( u | m ) du 6 / 27

  9. Gaussian example Assume we are interested in the measurement model M = AU + N , where: A : X → Y , with X = R d and Y = R k . N is white Gaussian noise. U follows Gaussian prior. Posterior has density � � − 1 R k − 1 π m ( u ) = π ( u | m ) ∝ exp 2 � m − Au � 2 2 � u � 2 Σ We can use the mean of the posterior as a point estimator but having the whole posterior allows uncertainty quantification. 7 / 27

  10. Why are we interested in uncertainty quantification? 8 / 27

  11. Uncertainty quantification has many applications Studying the whole posterior distribution instead of just a point estimate offers us more information. Uncertainty quantification Confidence and credible sets E.g. Weather and climate predictions Using the whole posterior Geological sensing Bayesian search theory Figure: Search for the wreckage of Air France flight AF 447, Stone et al. 9 / 27

  12. What do we mean by uncertainty quantification? -I’m going to die? -POSSIBLY. -Possibly? You turn up when people are possibly going to die? -OH, YES. IT’S QUITE THE NEW THING. IT’S BECAUSE OF THE UNCERTAINTY PRINCIPLE. -What’s that? -I’M NOT SURE. -That’s very helpful. -I THINK IT MEANS PEOPLE MAY OR MAY NOT DIE. I HAVE TO SAY IT’S PLAYING HOB WITH MY SCHEDULE, BUT I TRY TO KEEP UP WITH MODERN THOUGHT. -Terry Pratchett, The Fifth Elephant 10 / 27

  13. Bayesian credible set A Bayesian credible set is a region in the posterior distribution that contains a large fraction of the posterior mass. 11 / 27

  14. Frequentist confidence region 11 / 27

  15. Consistency of a Bayesian solution Once we have achieved a Bayesian solution the natural next step is to consider the consistency of the solution. Convergence of a point estimator to the ‘true’ u † . Contraction of the posterior distribution; Do we have Π( u : d ( u , u † ) > δ n | m ) → P u † 0 , for some δ n → 0, as the sample size n → ∞ . Is optimal contraction rate enough to guarantee that the Bayesian credible sets have correct frequentist coverage? 12 / 27

  16. Credible sets do not necessarily cover the truth well Correctly specified prior Prior misspecified on the boundary Monard, Nickl & Paternain, The Annals of Statistics , 2019 13 / 27

  17. Do credible sets quantify frequentist uncertainty? Do we have for C = C ( m ) � � � � u † ∈ C ( m † ) Π u ∈ C | m ≈ 0 . 95 ⇔ ≈ 0 . 95 ? P u † 14 / 27

  18. Do credible sets quantify frequentist uncertainty? Do we have for C = C ( m ) � � � � u † ∈ C ( m † ) Π u ∈ C | m ≈ 0 . 95 ⇔ ≈ 0 . 95 ? P u † Bernstein–von Mises Theorem (BvM) For large sample size n , with ˆ u MLE being the maximum likelihood estimator, � nI ( u † ) − 1 � u MLE , 1 Π( · | m ) ≈ N ˆ , for M ∼ P u † , whenever u † ∈ O ⊂ R d and the prior Π has positive density on O , and the inverse Fisher information I ( u † ) is invertible. 14 / 27

  19. BvM guarantees confident credible sets The contraction rate of the posterior distribution near u † is � � u : � u − u † � R d ≥ L n → P u † 0 Π √ n | m as L n , n → ∞ For a fixed d and large n computing posterior probabilities is roughly the � n I ( f † ) − 1 � u MLE , 1 same as computing them from N ˆ 15 / 27

  20. BvM guarantees confident credible sets The contraction rate of the posterior distribution near u † is � � u : � u − u † � R d ≥ L n → P u † 0 Π √ n | m as L n , n → ∞ For a fixed d and large n computing posterior probabilities is roughly the � n I ( f † ) − 1 � u MLE , 1 same as computing them from N ˆ � � � � u † ∈ C n C n s.t. Π u ∈ C n | M = 0 . 95 = ⇒ → 0 . 95 P u † ( Bayesian credible set ) ( Frequentist confident set ) � 1 � | C n | R d = O P u † √ n ( Optimal diameter ) 15 / 27

  21. Asymptotic normality of the Tikhonov regulariser We return to the Gaussian example where the posterior is also Gaussian. The posterior mean u equals the MAP estimate which equals the Tikhonov-regulariser � � � Au − m � 2 R k + � u � 2 u = arg min . Σ u Then the following convergence holds under P u † √ n ( u − u † ) → Z ∼ N ( 0 , I ( u † ) − 1 ) as n → ∞ . 16 / 27

  22. Confident credible sets We can now construct a confidence set for Tikhonov regulariser: Consider a credible set � � u ∈ R d : � u − u � ≤ R n √ n C n = , R n s.t. Π( C n | m ) = 0 . 95 . Then the frequentist coverage probability of C n will satisfy � � u † ∈ C n R n → P u † Φ − 1 ( 0 . 95 ) → 0 . 95 P u † and as n → ∞ . Here Φ − 1 is a continuous inverse of Φ = P ( Z ≤ · ) with Z ∼ N ( 0 , I ( u † ) − 1 ) . 17 / 27

  23. Discretisation of m is given by the measurement device but the discretisation of u can be chosen freely m ∈ R k u ∈ R n k = 4 n = 48 18 / 27

  24. The discretisations are independent m ∈ R k u ∈ R n k = 8 n = 156 19 / 27

  25. The discretisations are independent m ∈ R k u ∈ R n k = 24 n = 440 20 / 27

  26. The measurement is always discrete but the unknown is usually a continuous function m ∈ R 4 u ∈ L 2 21 / 27

  27. We often want to use a continuous model for theory m = Au + ε 22 / 27

  28. Nonparametric models In many applications it is natural to use a statistical regression model M i = ( AU )( x i ) + N i , i = 1 , ..., n , N i ∼ N ( 0 , 1 ) , where x i ∈ O are measurement points and A is a forward operator. The goal is to infer U from the data ( M i ) . 23 / 27

  29. Nonparametric models In many applications it is natural to use a statistical regression model M i = ( AU )( x i ) + N i , i = 1 , ..., n , N i ∼ N ( 0 , 1 ) , where x i ∈ O are measurement points and A is a forward operator. The goal is to infer U from the data ( M i ) . For the theory we use a continuous model, which corresponds ( x i ) growing dense in the domain O . If W is a Gaussian white noise process in the Hilbert space H then 1 √ n noise level, M = AU + ε W , ε = M ∼ P u † Note that usually Au ∈ L 2 but W ∈ H − s only with s > d / 2. 23 / 27

  30. Gaussian priors are often used for inverse problems Gaussian priors Π are often used in practice: see e.g Kaipio & Somersalo (2005), Stuart (2010), Dashti & Stuart (2016). 24 / 27

  31. Gaussian priors are often used for inverse problems Gaussian priors Π are often used in practice: see e.g Kaipio & Somersalo (2005), Stuart (2010), Dashti & Stuart (2016). Using the Cameron-Martin theorem we can formally write d Π( · | m ) ∝ e ℓ ( u ) d Π( u ) ∝ e ℓ ( u ) − 1 2 � u � 2 V Π , where ℓ ( u ) = 1 1 2 ε 2 � Au � 2 , and V Π denotes the ε 2 � m , Au � − Cameron-Martin space of Π . 24 / 27

Recommend


More recommend