Posterior consistency in Bayesian inference with exponential priors Masoumeh Dashti University of Sussex Workshop on Optimization and Inversion under Uncertainty Linz, 12 November 2019 Based on joint work with S Agapiou (Cyprus), T Helin (LUT, Finland)
The setting Suppose (indirect) noisy measurements, y , of quantity of interest, u , is available y = G ( u ) + η
The setting Suppose (indirect) noisy measurements, y , of quantity of interest, u , is available y = G ( u ) + η Examples. j = 1 , . . . , n , x j ∈ D ⊂ R d i) y j = u ( x j ) + η , u ∈ C b ( D )
The setting Suppose (indirect) noisy measurements, y , of quantity of interest, u , is available y = G ( u ) + η Examples. j = 1 , . . . , n , x j ∈ D ⊂ R d i) y j = u ( x j ) + η , u ∈ C b ( D ) j = 1 , . . . , n x j ∈ D ⊂ R d ii) y j = p ( x j ) + η , ∇ · ( u ∇ p ) = f in D u ∈ C b ( D ) with u > 0.
Bayesian approach Consider y = G ( u ) + η with u ∈ X , y ∈ R n ( X separable Banach spaces ), • prior u ∼ µ 0 • statistics of noise is known: η ∼ ρ η posterior µ y ( when well-defined ∗ ) satisfies µ y ( d u ) ∝ ρ η ( y − G ( u )) µ 0 ( d u ) � µ y ( A ) = ⇐ ⇒ c ρ η ( y − G ( u )) µ 0 ( d u ) ∀ A ∈ B ( X ) A � �� � d µ y ( u ) d µ 0
Posterior consistency suppose: y 1 . . ◮ y = with n arbitrarily large . y n ◮ there exists an underlying truth y = G ( w 0 ) + η
Posterior consistency suppose: y 1 . . ◮ y = with n arbitrarily large . y n ◮ there exists an underlying truth y = G ( w 0 ) + η Does µ y concentrate on arbitrarily small neighbourhoods of w 0 as n → ∞ and how fast?
Posterior consistency suppose: y 1 . . ◮ y = with n arbitrarily large . y n ◮ there exists an underlying truth y = G ( w 0 ) + η Does µ y concentrate on arbitrarily small neighbourhoods of w 0 as n → ∞ and how fast? Simpler: Do modes of µ y converge to w 0 ?
Outline 1 MAP estimators and weak posterior consistency Posterior consistency with contraction rates 2
Outline 1 MAP estimators and weak posterior consistency Posterior consistency with contraction rates 2
MAP estimates µ ( X ) = 1, X a function space There is no Lebesgue density, modes can be defined topologically: Any point ˜ u ∈ X satisfying sup u ∈ X µ ( B ǫ ( u )) lim = 1 , µ ( B ǫ (˜ u )) ǫ → 0 is a MAP estimator. (MD, L AW , S TUART , V OSS ’13)
∃ Z ⊂ X s.t. for u ∈ Z µ ( B ǫ ( u )) µ ( B ǫ ( 0 )) = e − I ( u ) lim ǫ → 0 ◮ If X = R n , Z = R n and I ( u ) = − log ρ µ ( u ) ◮ For X function space Z is a proper dense subset of X with µ ( Z ) = 0
∃ Z ⊂ X s.t. for u ∈ Z µ ( B ǫ ( u )) µ ( B ǫ ( 0 )) = e − I ( u ) lim ǫ → 0 ◮ If X = R n , Z = R n and I ( u ) = − log ρ µ ( u ) ◮ For X function space Z is a proper dense subset of X with µ ( Z ) = 0 Are modes of µ characterised by minimisers of I ?
The Prior X ⊂ L 2 { ψ j } orthonormal basis in L 2 ( T d ) , ξ j ∼ c p exp( − | x | p p ) , p ≥ 1, i.i.d { γ j } → 0 positive decreasing sequence µ 0 law of ( γ j ξ j ) j , and u ∼ µ 0 satisfies � u ( x ) = γ j ξ j ψ j ( x ) j ∈ N
The Prior X ⊂ L 2 { ψ j } orthonormal basis in L 2 ( T d ) , ξ j ∼ c p exp( − | x | p p ) , p ≥ 1, i.i.d { γ j } → 0 positive decreasing sequence µ 0 law of ( γ j ξ j ) j , and u ∼ µ 0 satisfies � u ( x ) = γ j ξ j ψ j ( x ) j ∈ N Gaussian Besov (L ASSAS , S AKSMAN , S ILTANEN ’09) p = 2 p ≥ 1, γ j negative powers of j { ψ j } an orthonormal basis { ψ j } orthonormal wavelet basis p = 1 sparsity promoting, continuous but not differentiable measure
d µ ( u ) = c e − Φ( u ) For with Φ given d µ 0 p � u � p I ( u ) = Φ( u ) + 1 Z , � p < ∞ � � � u ∈ X : � � � � u ,ψ j � Z := j γ j � 2 < ∞ � � � u ∈ X : � � � � u ,ψ j � for h ∈ Q := , γ j ξ j ∼ ρ j j γ j N d µ 0 , h ρ j ( u i − h j ) � ( u ) = lim d µ 0 ρ j ( u j ) N →∞ j = 1 hj − uj uj � N | p + | γ j | p j = 1 −| in L 1 = lim γ j N →∞ e µ For locally Lipschitz Φ , modes of µ are minimisers of I : • p = 2 MD, L AW , S TUART , V OSS ’13 ( Z = Q ) • p > 1 H ELIN & B URGER ’15; L IE & S ULLIVAN ’18 (differentiable) • p = 1 A GAPIOU , B URGER , MD, H ELIN ’18
Weak posterior consistency d µ y ( u ) ∝ ρ η ( y − G ( u )) =: e − Φ( u , y ) , suppose: d µ 0 y 1 . . ◮ y = with n arbitrarily large . y n ◮ there exists an underlying truth, ( y j = G ( w 0 ) + η j ) y = G ( w 0 ) + η for µ 0 exponential, MAP estimates are u n := argmin Φ( u , y ) + � u � Z . u ∈ Z
u n := argmin Φ( u , y ) + � u � Z u ∈ Z n � 2 + 2 + 1 � � � � = argmin | G ( w 0 ) − G ( u ) G ( w 0 ) − G ( u ) , η j n � u � Z spacespacespace n u ∈ Z j = 1 Theorem. (A GAPIOU , B URGER , D, H ELIN ’18) Assume that G : X → R + is locally Lipschitz and w 0 ∈ Z. Then • G ( u n ) → G ( w 0 ) in probability. • If G is injective � u n − w 0 � X → 0 in probability. Otherwise, ∃ u ∗ ∈ Z and a subseq of { u n } n ∈ N such that � u n − w 0 � X → 0 in probability. For any such u ∗ , G ( u ∗ ) = G ( w 0 ) .
u n := argmin Φ( u , y ) + � u � Z u ∈ Z n � 2 + 2 + 1 � � � � = argmin | G ( w 0 ) − G ( u ) G ( w 0 ) − G ( u ) , η j n � u � Z spacespacespace n u ∈ Z j = 1 Theorem. (A GAPIOU , B URGER , D, H ELIN ’18) Assume that G : X → R + is locally Lipschitz and w 0 ∈ Z. Then • G ( u n ) → G ( w 0 ) in probability. • If G is injective � u n − w 0 � X → 0 in probability. Otherwise, ∃ u ∗ ∈ Z and a subseq of { u n } n ∈ N such that � u n − w 0 � X → 0 in probability. For any such u ∗ , G ( u ∗ ) = G ( w 0 ) . small noise limit similar y = G ( w 0 ) + δ n η
Outline 1 MAP estimators and weak posterior consistency Posterior consistency with contraction rates 2
Consistency with contraction rates µ y is said to contract with rate ǫ n at w 0 if µ y �� �� u ∈ X : � u − w 0 � ≥ C ǫ n → 0 in P ( y | w 0 ) -probability
Consistency with contraction rates µ y is said to contract with rate ǫ n at w 0 if µ y �� �� u ∈ X : � u − w 0 � ≥ C ǫ n → 0 in P ( y | w 0 ) -probability G HOSAL , G OSH & VAN DER V AART ’00 give sufficient conditions on model and prior to ensure this. Conditions on prior • µ 0 puts sufficient mass around w 0 , • distribution of mass under µ 0 is ‘ not too complex ’ model: i.i.d sampling or white noise model
Conditions on prior – Exponential case A GAPIOU , MD & H ELIN ’18: For appropriate ǫ n ∗ there exists X n ⊂ X s.t. ◮ µ 0 ( � u − w 0 � X < 2 ǫ n ) ≥ e − n ǫ 2 n ǫ 2 ◮ log N (˜ ǫ n , X n , � · � X ) ≤ Cn ˜ ( N : min # of balls needed to cover X n ) n µ 0 ( X \ X n ) ≤ e − Cn ǫ 2 n
Conditions on prior – Exponential case A GAPIOU , MD & H ELIN ’18: For appropriate ǫ n ∗ there exists X n ⊂ X s.t. ◮ µ 0 ( � u − w 0 � X < 2 ǫ n ) ≥ e − n ǫ 2 n ǫ 2 ◮ log N (˜ ǫ n , X n , � · � X ) ≤ Cn ˜ ( N : min # of balls needed to cover X n ) n µ 0 ( X \ X n ) ≤ e − Cn ǫ 2 n ————————————————– ( ∗ ) ǫ n satisfies p � h � p φ w 0 ( ǫ n ) ≤ n ǫ 2 1 with φ w ( ǫ ) := inf Z − log µ 0 ( ǫ B X ) n h ∈ Z : � h − w � X ≤ ǫ (based on VAN DER V AART & VAN Z ANTEN ’08 for Gaussian)
• for h ∈ Z p � h � p µ 0 ( ǫ B X + h ) ≥ e − 1 Z µ 0 ( ǫ B X ) • By two-level Talagrand’s inequality – 1994 1 , ∀ M > 0 1 p µ ( A ) exp( − cM p ) 2 B Q + MB Z ) ≥ 1 − µ ( A + M p → choose X n = ǫ B X + M n B Q + M n B Z 2 1 with M n ∝ ( n ǫ 2 n ) p 1 generalised Borell’s inequality
Contraction rates φ w 0 ( ǫ n ) ≤ n ǫ 2 Find the largest ǫ n s.t. n � t • For White noise model y n = 1 0 u ( s ) d s + √ n B t , t ∈ [ 0 , 1 ] α + 1 with truth w 0 ∈ B β p qq and prior B -Besov measure pp � β n − 1 + 2 β + p ( α − β ) , if β ≤ α , c ǫ n = α n − 1 + 2 α , if β > α µ y �� �� u ∈ X : � u − w 0 � ≥ C ǫ n → 0 in P ( y | w 0 ) -probability
Contraction rates φ w 0 ( ǫ n ) ≤ n ǫ 2 Find the largest ǫ n s.t. n � t • For White noise model y n = 1 0 u ( s ) d s + √ n B t , t ∈ [ 0 , 1 ] α + 1 with truth w 0 ∈ B β p qq and prior B -Besov measure pp � β n − 1 + 2 β + p ( α − β ) , if β ≤ α , c ǫ n = α n − 1 + 2 α , if β > α µ y �� �� u ∈ X : � u − w 0 � ≥ C ǫ n → 0 in P ( y | w 0 ) -probability • Upper bounds on µ 0 ( ǫ B X + h ) enables study of lower bounds on concentration rates: → work in progress, recently established for p = 1
Final remarks • Convergence rates for MAPs for Gaussian priors N ICKL , VAN DE G EER , W ANG ’19 • Posterior contraction for nonlinear forward operator V OLLMER ’13 – pushforward µ 0 with G : elliptic inverse problem N ICKL ’17 – Bernstein-von Mises theorem: elliptic inverse problem • Generalised MAPs for discontinuous priors ( C LASON , H ELIN , K RETSCHMANN , P IIROINEN ’19 )
Recommend
More recommend