the convergence of the laplace approximation and noise
play

The Convergence of the Laplace Approximation and Noise-Level-Robust - PowerPoint PPT Presentation

The Convergence of the Laplace Approximation and Noise-Level-Robust Monte Carlo Methods for Bayesian Inverse Problems Daniel Rudolf, Claudia Schillings, Bj orn Sprungk, Philipp Wacker Institute of Mathematical Stochastics, University of G


  1. The Convergence of the Laplace Approximation and Noise-Level-Robust Monte Carlo Methods for Bayesian Inverse Problems Daniel Rudolf, Claudia Schillings, Bj¨ orn Sprungk, Philipp Wacker Institute of Mathematical Stochastics, University of G¨ ottingen Workshop “Optimization and Inversion under Uncertainty” RICAM Linz, November 15th, 2019 1 / 30

  2. Bayesian Inverse Problems Infer unknown x ∈ R d given noisy observations of forward map G : R d → R J ε ∼ N (0 , n − 1 Σ) , y = G ( x ) + ε, n ∈ N , Given prior measure µ 0 for x , here µ 0 = N (0 , C 0 ), we obtain a posterior µ n ( d x ) = 1 Φ( x ) = 1 2 | y − G ( x ) | 2 exp( − n Φ( x )) µ 0 ( d x ) , Σ − 1 , Z n � R d e − n Φ( x ) d µ 0 where Z n := Objective: Sample (approximately) from µ n and compute � f ∈ L 1 E µ n [ f ] = R d f ( x ) µ n ( d x ) , µ 0 ( R ) In this talk we are interested in the case of increasing precision n → ∞ 2 / 30

  3. Computational Bayesian Inference Computational methods for approximate sampling or integrating w.r.t. µ : Markov chain Monte Carlo, Importance sampling Sequential Monte Carlo and particle filters, Quasi-Monte Carlo and numerical quadrature, ... Common computational challenges : 3 / 30

  4. Computational Bayesian Inference Computational methods for approximate sampling or integrating w.r.t. µ : Markov chain Monte Carlo, Importance sampling Sequential Monte Carlo and particle filters, Quasi-Monte Carlo and numerical quadrature, ... Common computational challenges : Expensive evaluation of forward model G 1 → Multilevel or surrogate methods 3 / 30

  5. Computational Bayesian Inference Computational methods for approximate sampling or integrating w.r.t. µ : Markov chain Monte Carlo, Importance sampling Sequential Monte Carlo and particle filters, Quasi-Monte Carlo and numerical quadrature, ... Common computational challenges : Expensive evaluation of forward model G 1 → Multilevel or surrogate methods High-dimensional or even infinite-dimensional state space, e.g., function spaces 2 → Intense research in recent years for all mentioned methods 3 / 30

  6. Computational Bayesian Inference Computational methods for approximate sampling or integrating w.r.t. µ : Markov chain Monte Carlo, Importance sampling Sequential Monte Carlo and particle filters, Quasi-Monte Carlo and numerical quadrature, ... Common computational challenges : Expensive evaluation of forward model G 1 → Multilevel or surrogate methods High-dimensional or even infinite-dimensional state space, e.g., function spaces 2 → Intense research in recent years for all mentioned methods Concentrated µ n due to informative data, i.e., n ≫ 1 or J ≫ 1 3 → Analyzed so far in [Beskos et al., 2018] and [Schillings & Schwab, 2016] 3 / 30

  7. Computational Bayesian Inference Computational methods for approximate sampling or integrating w.r.t. µ : Markov chain Monte Carlo, Importance sampling Sequential Monte Carlo and particle filters, Quasi-Monte Carlo and numerical quadrature, ... Common computational challenges : Expensive evaluation of forward model G 1 → Multilevel or surrogate methods High-dimensional or even infinite-dimensional state space, e.g., function spaces 2 → Intense research in recent years for all mentioned methods Concentrated µ n due to informative data, i.e., n ≫ 1 or J ≫ 1 3 → Analyzed so far in [Beskos et al., 2018] and [Schillings & Schwab, 2016] 3 / 30

  8. Outline Laplace Approximation 1 Markov Chain Monte Carlo 2 Importance Sampling 3 Quasi Monte Carlo 4 4 / 30

  9. Next Laplace Approximation 1 Markov Chain Monte Carlo 2 Importance Sampling 3 Quasi Monte Carlo 4 5 / 30

  10. General Approach For Noise-Level Robust Sampling Prior-based sampling or integration will suffer from the increasing difference between µ n and µ 0 as n → ∞ , i.e., d µ n ∝ e − n Φ → δ argmin Φ and d TV ( µ n , µ 0 ) → 1 d µ 0 Idea: Base sampling methods on a suitable (simple) reference measure mimicking the (increasing) concentration of µ n 6 / 30

  11. General Approach For Noise-Level Robust Sampling Prior-based sampling or integration will suffer from the increasing difference between µ n and µ 0 as n → ∞ , i.e., d µ n ∝ e − n Φ → δ argmin Φ and d TV ( µ n , µ 0 ) → 1 d µ 0 Idea: Base sampling methods on a suitable (simple) reference measure mimicking the (increasing) concentration of µ n Here, Laplace approximation of µ n : L µ n := N ( x n , C n ), � � − 1 n Φ( x ) + 1 2 � C − 1 / 2 n ∇ 2 Φ( x n ) + C − 1 x � 2 , x n := argmin C n := 0 0 x 6 / 30

  12. General Approach For Noise-Level Robust Sampling Prior-based sampling or integration will suffer from the increasing difference between µ n and µ 0 as n → ∞ , i.e., d µ n ∝ e − n Φ → δ argmin Φ and d TV ( µ n , µ 0 ) → 1 d µ 0 Idea: Base sampling methods on a suitable (simple) reference measure mimicking the (increasing) concentration of µ n Here, Laplace approximation of µ n : L µ n := N ( x n , C n ), � � − 1 n Φ( x ) + 1 2 � C − 1 / 2 n ∇ 2 Φ( x n ) + C − 1 x � 2 , x n := argmin C n := 0 0 x Very common approximation in Bayesian statistics and OED ( [Long et al., 2013] , [Alexanderian et al., 2016] , [Chen & Ghattas, 2017] ...) 6 / 30

  13. Laplace’s Method for Asymptotics of Integrals [Laplace, 1774] [Wong, 2001] : Considering integrals � D ⊆ R d J ( n ) := f ( x ) exp( − n Φ( x )) d x , D with sufficiently smooth f and Φ, we have, under suitable conditions, as n → ∞ � � f ( x ⋆ ) J ( n ) = e − n Φ( x ⋆ ) n − d / 2 + O ( n − 1 ) � det(2 π H ⋆ ) where x ⋆ := argmin x ∈ R d Φ ∈ D and H ⋆ := ∇ 2 Φ( x ⋆ ) > 0 7 / 30

  14. Laplace’s Method for Asymptotics of Integrals [Laplace, 1774] [Wong, 2001] : Considering integrals � D ⊆ R d J ( n ) := f ( x ) exp( − n Φ( x )) d x , D with sufficiently smooth f and Φ, we have, under suitable conditions, as n → ∞ � � f ( x ⋆ ) J ( n ) = e − n Φ( x ⋆ ) n − d / 2 + O ( n − 1 ) � det(2 π H ⋆ ) where x ⋆ := argmin x ∈ R d Φ ∈ D and H ⋆ := ∇ 2 Φ( x ⋆ ) > 0 Yields: Given smooth Lebesgue density of µ 0 , then for suitable f � � � � � � � � R d f d N ( x ⋆ , ( nH ⋆ ) − 1 ) � ∈ O ( n − 1 ) R d f d µ n − � 7 / 30

  15. Convergence of Laplace Approximation Theorem ( [Schillings, S., Wacker, 2019] ) Given that Φ ∈ C 3 ( R d ) , unique x n and C n > 0 for sufficiently large n > 0 , a unique minimizer x ⋆ := argmin x ∈ R d Φ( x ) exists with ∇ 2 Φ( x ⋆ ) > 0 and � x − x ⋆ � > r Φ( x ) ≥ Φ( x ⋆ ) + δ r , inf δ r > 0 , lim n →∞ x n = x ⋆ . Then d H ( µ n , L µ n ) ∈ O ( n − 1 / 2 ) . 8 / 30

  16. Convergence of Laplace Approximation Theorem ( [Schillings, S., Wacker, 2019] ) Given that Φ ∈ C 3 ( R d ) , unique x n and C n > 0 for sufficiently large n > 0 , a unique minimizer x ⋆ := argmin x ∈ R d Φ( x ) exists with ∇ 2 Φ( x ⋆ ) > 0 and � x − x ⋆ � > r Φ( x ) ≥ Φ( x ⋆ ) + δ r , inf δ r > 0 , lim n →∞ x n = x ⋆ . Then d H ( µ n , L µ n ) ∈ O ( n − 1 / 2 ) . Closely related to the Bernstein–von Mises theorem but: Covariance of L µ n depends on given data (BvM: Fisher information) Misspecification (“ground truth” not in prior support) not important Density d µ n / d L µ n also exists in Hilbert spaces (for Gaussian µ 0 ) 8 / 30

  17. Remarks The convergence theorem can be extended under suitable assumptions to any prior µ 0 which is absolutely continuous w.r.t. Lebesgue measure, 1 sequences of Φ n , e.g., 2 n � Φ n ( x ) = 1 � y i − G ( x ) � 2 2 n i =1 the underdetermined case G : R d → R J , J < d , iff µ 0 is Gaussian and G acts 3 only on linear active subspace M with dim( M ) ≤ J : ∀ x ∈ R M ∀ m ∈ M ⊥ G ( x + m ) = G ( x ) , x n , � x n � , � C n − � C n � ∈ O ( n − 1 ) Approximations � C n of x n , C n such that � x n − � 4 9 / 30

  18. Examples µ 0 = N (0 , I 2 ), Φ( x ) = 1 2 � y − G ( x ) � 2 , G ( x ) = [exp( 1 5 ( x 2 − x 1 )) , sin( x 2 − x 1 )] ⊤ Posterior µ n L µ n 10 0 Hellinger distance 10 -1 10 -2 d H ( n , LA n ) rate: -0.50 10 -3 10 0 10 5 n 2 � 0 − G ( x ) � 2 with G ( x ) = x 2 − x 2 µ 0 = N (0 , I 2 ) and Φ( x ) = 1 1 Posterior µ n L µ n 1.2 Hellinger distance 1 0.8 0.6 d H ( n , LA n ) rate: 0.00 0.4 10 0 10 5 n 10 / 30

  19. Next Laplace Approximation 1 Markov Chain Monte Carlo 2 Importance Sampling 3 Quasi Monte Carlo 4 11 / 30

  20. Markov Chain Monte Carlo (MCMC) Construct Markov chain ( X m ) m ∈ N with invariant measure µ n , i.e., X m ∼ µ n ⇒ X m +1 ∼ µ n D and for f ∈ L 2 Given suitable conditions, we have X m − m →∞ µ n − − − → µ 0 ( R ) M � S M ( f ) := 1 a.s. f ( X m ) − − − − → E µ n [ f ] M M →∞ m =1 Autocorrelation of Markov chain effects efficiency: � � �� 2 � � ∞ � � � M E � S M ( f ) − E µ n [ f ] � − − − − → Var µ n ( f ) 1 + 2 Corr ( f ( X 1 ) , f ( X 1+ m )) M →∞ m =0 � �� � integrated autocorrelation time (IACT) 12 / 30

  21. Metropolis-Hastings (MH) algorithm [Metropolis et al., 1953] Given current state X m = x , draw new state y according to proposal kernel P ( x , · ): Y m ∼ P ( x ) 1 accept proposed y with acceptance probability α ( x , y ), i.e., set 2 � y , with probability α ( x , y ) , X m +1 = x , with probability 1 − α ( x , y ) . 13 / 30

Recommend


More recommend