posterior consistency in bayesian inference with
play

Posterior consistency in Bayesian inference with exponential priors - PowerPoint PPT Presentation

Posterior consistency in Bayesian inference with exponential priors Masoumeh Dashti University of Sussex Workshop on Optimization and Inversion under Uncertainty Linz, 12 November 2019 Based on joint work with S Agapiou (Cyprus), T Helin (LUT,


  1. Posterior consistency in Bayesian inference with exponential priors Masoumeh Dashti University of Sussex Workshop on Optimization and Inversion under Uncertainty Linz, 12 November 2019 Based on joint work with S Agapiou (Cyprus), T Helin (LUT, Finland)

  2. The setting Suppose (indirect) noisy measurements, y , of quantity of interest, u , is available y = G ( u ) + η

  3. The setting Suppose (indirect) noisy measurements, y , of quantity of interest, u , is available y = G ( u ) + η Examples. j = 1 , . . . , n , x j ∈ D ⊂ R d i) y j = u ( x j ) + η , u ∈ C b ( D )

  4. The setting Suppose (indirect) noisy measurements, y , of quantity of interest, u , is available y = G ( u ) + η Examples. j = 1 , . . . , n , x j ∈ D ⊂ R d i) y j = u ( x j ) + η , u ∈ C b ( D ) j = 1 , . . . , n x j ∈ D ⊂ R d ii) y j = p ( x j ) + η , ∇ · ( u ∇ p ) = f in D u ∈ C b ( D ) with u > 0.

  5. Bayesian approach Consider y = G ( u ) + η with u ∈ X , y ∈ R n ( X separable Banach spaces ), • prior u ∼ µ 0 • statistics of noise is known: η ∼ ρ η posterior µ y ( when well-defined ∗ ) satisfies µ y ( d u ) ∝ ρ η ( y − G ( u )) µ 0 ( d u ) � µ y ( A ) = ⇐ ⇒ c ρ η ( y − G ( u )) µ 0 ( d u ) ∀ A ∈ B ( X ) A � �� � d µ y ( u ) d µ 0

  6. Posterior consistency suppose:   y 1 .   . ◮ y =  with n arbitrarily large .  y n ◮ there exists an underlying truth y = G ( w 0 ) + η

  7. Posterior consistency suppose:   y 1 .   . ◮ y =  with n arbitrarily large .  y n ◮ there exists an underlying truth y = G ( w 0 ) + η Does µ y concentrate on arbitrarily small neighbourhoods of w 0 as n → ∞ and how fast?

  8. Posterior consistency suppose:   y 1 .   . ◮ y =  with n arbitrarily large .  y n ◮ there exists an underlying truth y = G ( w 0 ) + η Does µ y concentrate on arbitrarily small neighbourhoods of w 0 as n → ∞ and how fast? Simpler: Do modes of µ y converge to w 0 ?

  9. Outline 1 MAP estimators and weak posterior consistency Posterior consistency with contraction rates 2

  10. Outline 1 MAP estimators and weak posterior consistency Posterior consistency with contraction rates 2

  11. MAP estimates µ ( X ) = 1, X a function space There is no Lebesgue density, modes can be defined topologically: Any point ˜ u ∈ X satisfying sup u ∈ X µ ( B ǫ ( u )) lim = 1 , µ ( B ǫ (˜ u )) ǫ → 0 is a MAP estimator. (MD, L AW , S TUART , V OSS ’13)

  12. ∃ Z ⊂ X s.t. for u ∈ Z µ ( B ǫ ( u )) µ ( B ǫ ( 0 )) = e − I ( u ) lim ǫ → 0 ◮ If X = R n , Z = R n and I ( u ) = − log ρ µ ( u ) ◮ For X function space Z is a proper dense subset of X with µ ( Z ) = 0

  13. ∃ Z ⊂ X s.t. for u ∈ Z µ ( B ǫ ( u )) µ ( B ǫ ( 0 )) = e − I ( u ) lim ǫ → 0 ◮ If X = R n , Z = R n and I ( u ) = − log ρ µ ( u ) ◮ For X function space Z is a proper dense subset of X with µ ( Z ) = 0 Are modes of µ characterised by minimisers of I ?

  14. The Prior X ⊂ L 2 { ψ j } orthonormal basis in L 2 ( T d ) , ξ j ∼ c p exp( − | x | p p ) , p ≥ 1, i.i.d { γ j } → 0 positive decreasing sequence µ 0 law of ( γ j ξ j ) j , and u ∼ µ 0 satisfies � u ( x ) = γ j ξ j ψ j ( x ) j ∈ N

  15. The Prior X ⊂ L 2 { ψ j } orthonormal basis in L 2 ( T d ) , ξ j ∼ c p exp( − | x | p p ) , p ≥ 1, i.i.d { γ j } → 0 positive decreasing sequence µ 0 law of ( γ j ξ j ) j , and u ∼ µ 0 satisfies � u ( x ) = γ j ξ j ψ j ( x ) j ∈ N Gaussian Besov (L ASSAS , S AKSMAN , S ILTANEN ’09) p = 2 p ≥ 1, γ j negative powers of j { ψ j } an orthonormal basis { ψ j } orthonormal wavelet basis p = 1 sparsity promoting, continuous but not differentiable measure

  16. d µ ( u ) = c e − Φ( u ) For with Φ given d µ 0 p � u � p I ( u ) = Φ( u ) + 1 Z , � p < ∞ � � � u ∈ X : � � � � u ,ψ j � Z := j γ j � 2 < ∞ � � � u ∈ X : � � � � u ,ψ j � for h ∈ Q := , γ j ξ j ∼ ρ j j γ j N d µ 0 , h ρ j ( u i − h j ) � ( u ) = lim d µ 0 ρ j ( u j ) N →∞ j = 1 hj − uj uj � N | p + | γ j | p j = 1 −| in L 1 = lim γ j N →∞ e µ For locally Lipschitz Φ , modes of µ are minimisers of I : • p = 2 MD, L AW , S TUART , V OSS ’13 ( Z = Q ) • p > 1 H ELIN & B URGER ’15; L IE & S ULLIVAN ’18 (differentiable) • p = 1 A GAPIOU , B URGER , MD, H ELIN ’18

  17. Weak posterior consistency d µ y ( u ) ∝ ρ η ( y − G ( u )) =: e − Φ( u , y ) , suppose: d µ 0   y 1 .   . ◮ y =  with n arbitrarily large .  y n ◮ there exists an underlying truth, ( y j = G ( w 0 ) + η j ) y = G ( w 0 ) + η for µ 0 exponential, MAP estimates are u n := argmin Φ( u , y ) + � u � Z . u ∈ Z

  18. u n := argmin Φ( u , y ) + � u � Z u ∈ Z n � 2 + 2 + 1 � � � � = argmin | G ( w 0 ) − G ( u ) G ( w 0 ) − G ( u ) , η j n � u � Z spacespacespace n u ∈ Z j = 1 Theorem. (A GAPIOU , B URGER , D, H ELIN ’18) Assume that G : X → R + is locally Lipschitz and w 0 ∈ Z. Then • G ( u n ) → G ( w 0 ) in probability. • If G is injective � u n − w 0 � X → 0 in probability. Otherwise, ∃ u ∗ ∈ Z and a subseq of { u n } n ∈ N such that � u n − w 0 � X → 0 in probability. For any such u ∗ , G ( u ∗ ) = G ( w 0 ) .

  19. u n := argmin Φ( u , y ) + � u � Z u ∈ Z n � 2 + 2 + 1 � � � � = argmin | G ( w 0 ) − G ( u ) G ( w 0 ) − G ( u ) , η j n � u � Z spacespacespace n u ∈ Z j = 1 Theorem. (A GAPIOU , B URGER , D, H ELIN ’18) Assume that G : X → R + is locally Lipschitz and w 0 ∈ Z. Then • G ( u n ) → G ( w 0 ) in probability. • If G is injective � u n − w 0 � X → 0 in probability. Otherwise, ∃ u ∗ ∈ Z and a subseq of { u n } n ∈ N such that � u n − w 0 � X → 0 in probability. For any such u ∗ , G ( u ∗ ) = G ( w 0 ) . small noise limit similar y = G ( w 0 ) + δ n η

  20. Outline 1 MAP estimators and weak posterior consistency Posterior consistency with contraction rates 2

  21. Consistency with contraction rates µ y is said to contract with rate ǫ n at w 0 if µ y �� �� u ∈ X : � u − w 0 � ≥ C ǫ n → 0 in P ( y | w 0 ) -probability

  22. Consistency with contraction rates µ y is said to contract with rate ǫ n at w 0 if µ y �� �� u ∈ X : � u − w 0 � ≥ C ǫ n → 0 in P ( y | w 0 ) -probability G HOSAL , G OSH & VAN DER V AART ’00 give sufficient conditions on model and prior to ensure this. Conditions on prior • µ 0 puts sufficient mass around w 0 , • distribution of mass under µ 0 is ‘ not too complex ’ model: i.i.d sampling or white noise model

  23. Conditions on prior – Exponential case A GAPIOU , MD & H ELIN ’18: For appropriate ǫ n ∗ there exists X n ⊂ X s.t. ◮ µ 0 ( � u − w 0 � X < 2 ǫ n ) ≥ e − n ǫ 2 n ǫ 2 ◮ log N (˜ ǫ n , X n , � · � X ) ≤ Cn ˜ ( N : min # of balls needed to cover X n ) n µ 0 ( X \ X n ) ≤ e − Cn ǫ 2 n

  24. Conditions on prior – Exponential case A GAPIOU , MD & H ELIN ’18: For appropriate ǫ n ∗ there exists X n ⊂ X s.t. ◮ µ 0 ( � u − w 0 � X < 2 ǫ n ) ≥ e − n ǫ 2 n ǫ 2 ◮ log N (˜ ǫ n , X n , � · � X ) ≤ Cn ˜ ( N : min # of balls needed to cover X n ) n µ 0 ( X \ X n ) ≤ e − Cn ǫ 2 n ————————————————– ( ∗ ) ǫ n satisfies p � h � p φ w 0 ( ǫ n ) ≤ n ǫ 2 1 with φ w ( ǫ ) := inf Z − log µ 0 ( ǫ B X ) n h ∈ Z : � h − w � X ≤ ǫ (based on VAN DER V AART & VAN Z ANTEN ’08 for Gaussian)

  25. • for h ∈ Z p � h � p µ 0 ( ǫ B X + h ) ≥ e − 1 Z µ 0 ( ǫ B X ) • By two-level Talagrand’s inequality – 1994 1 , ∀ M > 0 1 p µ ( A ) exp( − cM p ) 2 B Q + MB Z ) ≥ 1 − µ ( A + M p → choose X n = ǫ B X + M n B Q + M n B Z 2 1 with M n ∝ ( n ǫ 2 n ) p 1 generalised Borell’s inequality

  26. Contraction rates φ w 0 ( ǫ n ) ≤ n ǫ 2 Find the largest ǫ n s.t. n � t • For White noise model y n = 1 0 u ( s ) d s + √ n B t , t ∈ [ 0 , 1 ] α + 1 with truth w 0 ∈ B β p qq and prior B -Besov measure pp � β n − 1 + 2 β + p ( α − β ) , if β ≤ α , c ǫ n = α n − 1 + 2 α , if β > α µ y �� �� u ∈ X : � u − w 0 � ≥ C ǫ n → 0 in P ( y | w 0 ) -probability

  27. Contraction rates φ w 0 ( ǫ n ) ≤ n ǫ 2 Find the largest ǫ n s.t. n � t • For White noise model y n = 1 0 u ( s ) d s + √ n B t , t ∈ [ 0 , 1 ] α + 1 with truth w 0 ∈ B β p qq and prior B -Besov measure pp � β n − 1 + 2 β + p ( α − β ) , if β ≤ α , c ǫ n = α n − 1 + 2 α , if β > α µ y �� �� u ∈ X : � u − w 0 � ≥ C ǫ n → 0 in P ( y | w 0 ) -probability • Upper bounds on µ 0 ( ǫ B X + h ) enables study of lower bounds on concentration rates: → work in progress, recently established for p = 1

  28. Final remarks • Convergence rates for MAPs for Gaussian priors N ICKL , VAN DE G EER , W ANG ’19 • Posterior contraction for nonlinear forward operator V OLLMER ’13 – pushforward µ 0 with G : elliptic inverse problem N ICKL ’17 – Bernstein-von Mises theorem: elliptic inverse problem • Generalised MAPs for discontinuous priors ( C LASON , H ELIN , K RETSCHMANN , P IIROINEN ’19 )

Recommend


More recommend