bayesian model comparison with applications
play

Bayesian model comparison with applications Johannes Bergstr om - PowerPoint PPT Presentation

Foundations Bayesian inference Examples and applications Bayesian model comparison with applications Johannes Bergstr om Department of Theoretical Physics, KTH Royal Institute of Technology July 16, 2013 Johannes Bergstr om Bayesian


  1. Foundations Bayesian inference Examples and applications Bayesian model comparison with applications Johannes Bergstr¨ om Department of Theoretical Physics, KTH Royal Institute of Technology July 16, 2013 Johannes Bergstr¨ om Bayesian model comparison with applications

  2. Foundations Bayesian inference Examples and applications Outline Foundations 1 Bayesian inference 2 Examples and applications 3 Johannes Bergstr¨ om Bayesian model comparison with applications

  3. Foundations Bayesian inference Examples and applications Physics – how to do it? Experiment and observe – compare with predictions of models No perfect experiments – always noise/uncertainties, limited resources/sensitivity/range Logically deducing the true model doesn’t work All we can say is if a model is plausible description of data or not But how to determine this? Johannes Bergstr¨ om Bayesian model comparison with applications

  4. Foundations Bayesian inference Examples and applications Important information If you really don’t like statistics ..... you can stop listening now Johannes Bergstr¨ om Bayesian model comparison with applications

  5. Foundations Bayesian inference Examples and applications Principle of Bayesian inference Bayesian inference in a nutshell Assess hypotheses/models by calculating their plausibilities, conditioned on some known and/or presumed information. Cox’s Theorem (1946) The unique calculus of plausibility is probability theory (using some requirements incl. comparability, consistency) Unique extension of deductive logic incorporating uncertainty truth → 1, falsehood → 0 Johannes Bergstr¨ om Bayesian model comparison with applications

  6. Foundations Bayesian inference Examples and applications Probability interpretations: what is distributed in Pr( X )? Bayesian probability Describes uncertainty Defined as plausibility Probability distributed over different propositions X X is not distributed nor random Frequentist probability Describes “randomness” Defined as long-run relative frequency of event X is distributed – a random variable Johannes Bergstr¨ om Bayesian model comparison with applications

  7. Foundations Bayesian inference Examples and applications Foundations 1 Bayesian inference 2 Examples and applications 3 Johannes Bergstr¨ om Bayesian model comparison with applications

  8. Foundations Bayesian inference Examples and applications Bayesian inference – updating probabilities Updating probabilities Models H 1 . . . H r , data D . Bayes’ theorem: Pr( H i | D ) = Pr( D | H i ) Pr( H i ) Pr( D ) Pr( H i ) – prior probability Pr( H i | D ) – posterior probability Pr( D | H i ) = L ( H i ) – likelihood of H i Pr( H i | D ) L ( H i ) Pr( H i ) = Pr( H j | D ) L ( H j ) Pr( H j ) Posterior odds = Bayes factor · Prior odds Usually Prior odds = 1 Calculate either Bayes factor/posterior odds In addition assume precisely one of the H ′ i s correct ⇒ finite Pr( H i | D ) Johannes Bergstr¨ om Bayesian model comparison with applications

  9. Foundations Bayesian inference Examples and applications Model likelihood or evidence Models usually have free parameters Θ Likelihood for model – evidence – � � Pr( D | Θ , H ) Pr( Θ | H ) d N Θ = L ( Θ ) π ( Θ ) d N Θ L ( H ) = Pr( D | H ) = Model likelihood = Average likelihood of model parameters π ( Θ ) – Prior distribution – plausibility of parameters assuming model correct Evidence balances quality of fit vs. model complexity – can favour simpler model All probabilities conditioned on relevant background information (models, experimental setup, . . . ) Johannes Bergstr¨ om Bayesian model comparison with applications

  10. Foundations Bayesian inference Examples and applications Occam’s razor Evidence = probability with which model predicted observed data Occam’s razor – “simple” ≡ predictive Complex models compatible with large variety of data – predict less Simpler model More complex model Pr(D|H) Possible observations D Johannes Bergstr¨ om Bayesian model comparison with applications

  11. Foundations Bayesian inference Examples and applications Jeffreys scale Scale of interpretation easily calibrated: Jeffreys scale | log(odds) | odds Pr( H 1 | D ) Interpretation < 1 . 0 � 3 : 1 � 0 . 75 Inconclusive 1 . 0 ≃ 3 : 1 ≃ 0 . 75 Weak evidence 2 . 5 ≃ 12 : 1 ≃ 0 . 92 Moderate evidence 5 . 0 ≃ 150 : 1 ≃ 0 . 993 Strong evidence Johannes Bergstr¨ om Bayesian model comparison with applications

  12. Foundations Bayesian inference Examples and applications Priors Must specify priors on all model parameters – not invariant under general reparametrizations Important part of Bayesian analysis – consider carefully Uniform prior in the variable you happen to be writing your equations in (signal rate, x-section) often bad choice Improper prior always bad choice Evaluate sensitivity to prior choice Johannes Bergstr¨ om Bayesian model comparison with applications

  13. Foundations Bayesian inference Examples and applications Parameter inference Parameter inference – posterior distribution Assuming model H correct, infer its parameters Pr( Θ | D , H ) = Pr( D | Θ , H ) Pr( Θ | H ) = L ( Θ ) π ( Θ ) Pr( D | H ) L ( H ) Posterior of subsets of parameter by integrating over other parameters Posterior not enough to test/compare any model(s), claim discoveries – by definition Comparing models using posterior Compare nested model with η = η 0 using L ( η = η 0 ) L ( η � = η 0 ) = Pr( η 0 | D , H ) = Posterior at η 0 (Savage-Dickey density ratio) π ( η 0 | H ) Prior at η 0 Johannes Bergstr¨ om Bayesian model comparison with applications

  14. Foundations Bayesian inference Examples and applications Frequentist model evaluation: P-values P-values P-value ≡ probability of obtaining equal or more extreme data than the observed assuming H 0 Extreme ≡ large value of test statistic ( χ 2 , profile likelihood, . . . ) Converted into “No. of σ ’s” using Gaussian CDF: S = φ − 1 (1 − p ) P-values are not See also D’Agostini, 1112.3620 Probability H 0 correct Probability data is “just a fluctuation” Probability of incorrectly rejecting H 0 Type-1 error rate α (0.05, 0.01...) Interpretation needs uniform scale – not really possible Johannes Bergstr¨ om Bayesian model comparison with applications

  15. Foundations Bayesian inference Examples and applications Model comparison in particle physics In particle physics Use to compare (“test”) different models Testing existence of “new physics” Discovery is primary – precise parameter values describing new physics often secondary Possible applications θ 13 = 0 vs. θ 13 > 0 CP-violation vs. CP-conservation Normal vs. inverted ordering Maximal vs. nonmaximal θ 23 Evidence of effects of neutrino mass: 0 νββ , β -decay, cosmology. Theoretical models of lepton mass, flavour, DM, . . . . . . Johannes Bergstr¨ om Bayesian model comparison with applications

  16. Foundations Bayesian inference Examples and applications Foundations 1 Bayesian inference 2 Examples and applications 3 Johannes Bergstr¨ om Bayesian model comparison with applications

  17. Foundations Bayesian inference Examples and applications Leptonic mixing angle θ 13 – flashback to fall 2011 Question Is θ 13 = 0 or not? Profile likelihood ratio Schwetz, T´ ortola, Valle, 1108.1376 L ( θ max ) (∆ χ 2 ≃ 10) 13 p ≃ 1 . 5 · 10 − 3 L ( θ 13 = 0) ≃ 150 ⇒ Model comparison Bergstr¨ om, 1205.4404 Compare model θ 13 > 0 ( ∈ [0 , π/ 2]) with model θ 13 = 0 Compact parameter space ⇒ robust results Approx L ( θ 13 ) ∝ ∼ L profile ( θ 13 ) ⇒ L ( θ 13 > 0) L ( θ 13 = 0) ≃ 3 Barely weak preference for θ 13 > 0 Assign 0.5 prior ⇒ Pr( θ 13 = 0 | D ) ≃ 0 . 25 Johannes Bergstr¨ om Bayesian model comparison with applications

  18. Foundations Bayesian inference Examples and applications Leptonic mixing angle θ 23 – today Question θ 23 is large, but is θ 23 maximal ( π/ 4) or not? Profile likelihood (for NO) ν fit v1.1: www.nu-fit.org, 1209.3023 (Gonzalez-Garcia, Maltoni, Salvado, Schwetz) L ( θ max ) (∆ χ 2 ≃ 1 . 8) 23 L ( θ 23 = π/ 4) ≃ 2 . 5 ⇒ p ≃ 0 . 18 Likelihood 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 sin( θ ) 2 Johannes Bergstr¨ om Bayesian model comparison with applications

  19. Foundations Bayesian inference Examples and applications Leptonic mixing angle θ 23 – today Model comparison Use L ( s 2 23 ) ∝ ∼ L profile ( s 2 23 ) and π ( s 2 23 ) = 1 Compare model likelihoods L ( θ 23 � = π/ 2) L ( θ 23 = π/ 4) ≃ 0 . 3 Maximal mixing preferred by data (weakly) Model with maximal θ 23 (slightly) better than non-maximal model Assign 0.5 prior ⇒ Pr( θ 23 = π/ 4 | D ) ≃ 0 . 75 Octant comparison L ( θ 23 < π/ 4) L ( θ 23 > π/ 4) ≃ 2 Future prospects Strong evidence for maximal mixing requires uncertainty on s 2 23 of roughly 0.002 (0 . 02 for moderate) Johannes Bergstr¨ om Bayesian model comparison with applications

  20. Foundations Bayesian inference Examples and applications Neutrino parameters and cosmology Cosmological data sensitive to N eff Planck collaboration, 1303.5076 Planck + WP+ highL 1.0 + BAO + H 0 0.8 + BAO+ H 0 P/ P max 0.6 0.4 0.2 0.0 2.4 3.0 3.6 4.2 N ef How much evidence is there against N eff = 3 . 046? Johannes Bergstr¨ om Bayesian model comparison with applications

Recommend


More recommend