thermodynamic formalism and uncertainty quantification
play

THermodynamic Formalism and Uncertainty Quantification Luc - PowerPoint PPT Presentation

THermodynamic Formalism and Uncertainty Quantification Luc Rey-Bellet University of Massachusetts Amherst Quantissima III, Venice, August 2019 Work supported by NSF and AFOSR 1 Collaborators on related projects Paul Dupuis (Brown


  1. THermodynamic Formalism and Uncertainty Quantification Luc Rey-Bellet University of Massachusetts Amherst Quantissima III, Venice, August 2019 Work supported by NSF and AFOSR 1

  2. Collaborators on related projects • Paul Dupuis (Brown University), • Markos Katsoulakis (UMass Amherst) • Sung-Ha Hwang (KAIST) • Peter Plechac (U. of Delaware) • Yannis Pantazis (FORTH Crete) • Jeremiah Birrell (UMass Amherst) • Panagiota Birmpa(UMass Amherst) • Konstantinos Gourgoulias (UMass Amherst) • Jinchao Feng (UMass Amherst) • Jie Wang (UMass Amherst) • Sosung Baek (KAIST) 2

  3. Some references: [1] K. Chowdhary and P. Dupuis: Distinguishing and integrating aleatoric and epistemic variation in uncertainty quantification. ESAIM: M2AN , 47:635–662, 2013. [2] R. Atar, K. Chowdhary, and P. Dupuis: Robust bounds on risk-sensitive functionals via r´ enyi divergence. SIAM/ASA Jour- nal on UQ , 3:18–33, 2015. [3] P. Dupuis, M. A. Katsoulakis, Y. Pantazis, and P. Plech´ aˇ c. Path-Space Information Bounds for Uncertainty Quantification and Sensitivity Analysis of Stochastic Dynamics. SIAM/ASA Journal on UQ , 4(1):80–111, 2016. [4] M. Katsoulakis M., L. Rey-Bellet L. and J. Wang.: Scal- able Information Inequalities for Uncertainty Quantification. J. Comp. Phys. 336 , 1, 513-545 (2017) [5] K. Gourgoulias, M. Katsoulakis, L. Rey-Bellet L. and J. Wang: How biased is your model? Concentration inequalities, information and model bias. To be published in IEEE Trans. Inf. Theory 3

  4. [6] P. Dupuis, M. Katsoulakis,Y. Pantazis, and L. Rey-Bellet: Sensitivity Analysis for Rare Events based on Renyi Divergence. To be published in Ann. Appl. Prob. [7] J. Birrel and L. Rey-Bellet: Uncertainty Quantification for Markov Processes via Variational Principles and Functional In- equalities. Submitted. arXiv:1812.05174 [8] J. Birrel and L. Rey-Bellet: Concentration Ineqaulities and Performance Guaraantees for Hypocoercive Samplers. Submit- ted arXiv:1907.11973 [9] J. Birrell, M. Katsoulakis, and L. Rey-Bellet: Robustness of Dynamical Quantities of Interest via Goal-Oriented Information Theory. arXiv:1906.09282 [10] S. Baek, S.-H. Hwang, and L. Rey-Bellet: Thermodynami- cal formalism and Uncertainty Quantification. In Preparation. • and several more to come. 4

  5. UQ framework: Baseline model → Baseline model P (= probability measure on X ) . Think of it as a (tractable) model you use to compute or do analysis Maybe obtained after inference and/or model reduction, and so on.... Mots interesting you should think of P is high-dimensional, e.g, P ν is the distribution of a process { X t } 0 ≤ t ≤∞ with X 0 ∼ ν. P is a Gibbs measure on Ω Z d In any case, we think there are possibly lots of and large uncer- tainties in the model (model-form uncertainties) P IS NOT TO BE TRUSTED!! 5

  6. UQ framework: Quantities of interest Specific observables/statistics/quantities of interest = QoI • E P [ f ] (Expectation) Cov P ( f,g ) √ • Var P ( f ) (Variance) or Var P ( f )Var P ( g ) (correlation), or • Λ P,f ( c ) = log E P [ e cf ] (risk sensitive functional) • log P ( A ) ∼ log e − I ( A ) /ǫ (probability of some rare event) or maybe path-space QoI �� τ 0 f ( x t ) dt � • E P ν . where τ is a stopping time. � � � T 1 • E P ν 0 f ( x s ) dt that is ergodic averages. T �� ∞ 0 e − λs f ( x s ) dt � • E P ν that is discounted observables. • and so on.... 6

  7. UQ framework: Non Parametric Stress tests → Family of alternative models Q . Think of it as describing the true but ”unknowable” or partially known models. Set Q η = { Q is η ”close” to P } Given a QoI f can one find uncertainty bounds or performance guarantees inf E Q [ f ] ≤ E P [ f ] ≤ sup E Q [ f ]? Q ∈Q η Q ∈Q η and similarly for other quantities. The bounds should be tight and computable (numerically or analytically). → Robustness , cf book by Hansen (Nobel 2011) and Sargent (Nobel 2013) → Stress tests in Operation research, Finance, etc.... 7

  8. UQ framework: distances and divergences Which measure of distance or pseudo-distance divergence should one use? → Use Information Theory concepts to measure information loss between Q and P . • Relative entropy (a.k.a Kullback-Leibler divergence) � � log dQ R ( Q || P ) = E Q dP • Relative Renyi entropy (a.k.a Renyi divergence): For α � = 0 , 1 � dQ α � � � 1 1 e α log dQ R α ( Q || P ) = α ( α − 1) log E P = α ( α − 1) log E P dP dP Note that � R ( Q || P ) as α → 1 R α ( Q || P ) → R ( P || Q ) as α → 0 8

  9. UQ framework: distances and divergences • Scalability: If Q 0: T and P 0: T are the distribution of the process restricted to the time window 0 to T then, typically, R α ( Q 0: T || P 0: T ) . = O ( T ) as T → ∞ i.e. Information is additive. For the relative entropy we have the chain rule for relative entropy which is even better (not asymptotic in T ). • Information processing inequality: If F is a sub σ -algebra then R α ( Q | F || P | F ) ≤ R α ( Q || P ) • What is the right divergence for the QoI? • Not the whole story: → Heavy tailed observable may require other entropies (f-divergences) → Wasserstein type distances— needed if Q �≪ P .... 9

  10. What is wrong with CKP? Scalability Czsizar-Kullback-Pinsker � | E Q [ f ] − E P [ f ] | ≤ 2 R ( Q || P ) � f − E P [ f ] � ∞ Take e.g. Markov measures P = P 0: T and Q = Q 0: T and � T F T = 1 f ( X s ) ds . T 0 Then � F T � ∞ = � f � ∞ = O (1) and R ( Q 0: T || P 0: T ) = O ( T ) and so � 2 R ( Q 0: T || P 0: T ) � F T − E P [ F T ] � ∞ | E Q 0: T [ F T ] − E P 0: T [ F T ] | ≤ � �� � � �� � √ = O (1) = O ( T ) CKP does not scale correctly! Note though that � 1 � Var P 0: T [ F T ] = O T so one would need the variance instead of the sup norm. 10

  11. Gibbs Variational principle a.k.a. F = U − TS • Relative entropy (a.k.a Kullback-Leibler divergence). � � � log dQ if Q ≪ P E Q R ( Q || P ) = dP + ∞ otherwise R ( Q || P ) is a divergence, that is R ( Q || P ) ≥ 0 and R ( Q || P ) = 0 if and only if Q = P . • Gibbs variational principle for the relative entropy: (convex duality). � e f � log E P = sup { E Q [ f ] − R ( Q || P ) } Q with the supremum attained if and only if dQ = dQ f = e f dP E P [ e f ] Play a central role in statistical mechanics, in large deviation theory and in dynamical systems. 11

  12. Gibbs information inequality From the Gibbs variational principle, for any Q and c ≥ 0 � e ± cf � E Q [ ± cf ] ≤ log E P + R ( Q || P ) . Theorem (Gibbs Information inequality) � Λ( − c ) + R ( Q || P ) � � Λ( c ) + R ( Q || P ) � − inf ≤ E Q [ f ] − E P [ f ] ≤ inf c c c> 0 c> 0 � �� � � �� � = Ξ P , − f ( R ( Q || P )) = Ξ P , f ( R ( Q || P )) � Λ( c ) + η � Ξ P,f ( η ) ≡ inf c c> 0 � e c ( f − E P [ f ]) � � e cf � Λ( c ) = log E P = log E P − E P [ f ] How good is it? (Long history... Dupuis; Bobkov; Boucheron, Lugosi. Massart; Breuer,Czizsar, etc...) 12

  13. Properties of the Gibbs information inequality Ξ P,f ( R ( Q || P )) is a divergence , i.e. � η = 0 i.e. Q = P Ξ P,f ( η ) ≥ 0 and Ξ P,f ( η ) = 0 ⇔ or f = const Moreover the Gibbs information inequality is tight : Given the family of alternative models Q η = { Q ; R ( Q || P ) ≤ η } we have Ξ P,f ( η ) = max { E Q [ f ] − E P [ f ] } Q ∈Q η and the maximum is attained at Q η ∈ Q η with e c ( η ) f dQ η dP = E P [ e c ( η ) f ] with c such that R ( Q η || Q ) = η and of course similarly for min 13

  14. Concentration / UQ duality Recall: If X 1 , X 2 , · · · are IID copies with (centered) MGF Λ( c ) for f ( X ) then by Chernov bound � � � N 1 ≤ e − N Λ ∗ ( x ) f ( X i ) − E P [ f ] > x Concentration P N k =1 and by Cramer and Sanov Theorem and the contraction principle Λ ∗ ( x ) = sup { xc − Λ( c ) } (Legendre transform) c = inf Q { R ( Q || P ) ; E Q [ f ] − E P [ f ] = x } ”(Entropy maximization)” versus (duality of optimization problems) � Λ( ± c ) + η � (Λ ∗ ) − 1 ± ( η ) = inf (Fenchel-Young) c c ≥ 0 = sup {± ( E Q [ f ] − E P [ f ]) ; R ( Q || P ) = η } (UQ bounds) Q 14

  15. Linearization/ Variance Linearization: For small η = R ( Q || P ) one has the asymptotic expansion � � 2Var P [ f ] η + 1 Var P [ f ] γ P ( f ) η + O ( η 3 / 2 ) Ξ P,f ( η ) = 3 where γ P ( f ) = E [ | f − E P [ f ] | 3 ] is the skewness. Var P [ f ] 3 / 2 − → For small pertubation of P UQ is driven by CLT fluctuations, in the linear regime. − → For large perturbations of P UQ is driven by rare events or rather concentration of measure 15

  16. Markov process: chosing the right path space entropy Baselines: Markov process X t with path-space measure P 0: T Alternative: Stochastic process Y t with path-space measure Q 0: T (not necessarily Markovian!) and Q 0: T ≪ P 0: T Idea is to restrict the relative entropy to a sub σ -algebra tailored to the observables at hand • Ergodic averages. Apply the inequality to F T = � T 0 f ( X t ) dt � 1 � � F T � � F T � T log E P [ e c ( F T − E P [ F T ]) ] + 1 T R ( Q 0: T ν 0 || P 0: T µ 0 ) E Q − E P ≤ inf T T c c> 0 Under suitable ergodicity assumptions for X t the bounds scale as T → ∞ . The important quantity is the relative entropy rate (it scales nicely with T as we shall see later)... 16

Recommend


More recommend