A Test Statistic for Weighted Runs Frederik Beaujean, Allen Caldwell http://arxiv.org/abs/1005.3233v2 COMPSTAT 2010 Paris, 23.8.2010
Motivating example Suppose: y i Measurements with Gaussian uncertainty ● Standard Model (SM) background is quadratic ● New physics (NP) predicts signal peak ● 23.08.2010 Frederik Beaujean #2
Goodness of Fit: standard approach Test statistic: Any scalar function of data, T(D) ● Interpret: large T(D) = poor model ● ∝ ∏ exp { − y i − f x i ∣ } = exp { Example: 2 2 } 2 − P D ∣ Prob. density of the data ● 2 2 i 2 D T D ≡ Familiar choice ● 23.08.2010 Frederik Beaujean #3
p-value Def: p ≡ P T T D p T(D) Assuming the model and before data is taken: ● p uniform in [0,1] Critical values: p 0.05,0.01 ⇒ reject model ● Warning: p-value not the P . that the model is true ● Example: p SM = 10%, p NP = 37% ⇒ both OK 23.08.2010 Frederik Beaujean #4
Runs Most statistics disrespect order ● of data, information wasted Human brain good for simple ● problems Example: N=25 datapoints ● Each Gaussian with mean = 0 ● and variance = 1 Can we combine information about order and magnitude of deviation ? 23.08.2010 Frederik Beaujean #5
Runs statistic Proposal: Split data into runs ● Each run has a weight ● Gaussian case: T est statistic: largest weight of ● any run p-value becomes ● 23.08.2010 Frederik Beaujean #6
Runs distribution Gaussian case: Distribution of T exactly ● calculated for any N (non- parametric) Requires sum over integer ● partitions N = 25 23.08.2010 Frederik Beaujean #7
Power 5% level New physics contribution: ● T up to 35% more ● powerful than classic in detecting departures of type y(x) Lorentz peak with amplitude A 23.08.2010 Frederik Beaujean #8
Conclusions choose statistic with specific alternative models in mind ● Runs statistic T excellent for “bump hunting” ● FINIS FINIS 23.08.2010 Frederik Beaujean #9
Backup 23.08.2010 Frederik Beaujean #10
Exact runs distribution I 23.08.2010 Frederik Beaujean #11
Exact runs distribution II 23.08.2010 Frederik Beaujean #12
Exact runs distribution III 23.08.2010 Frederik Beaujean #13
Computational complexity: Integer partitions 23.08.2010 Frederik Beaujean #14
Goodness of Fit: Bayesian approach Model selection: Need explicit alternatives M 1 , M 2 P M 1 ∣ D P M 2 ∣ D = P M 1 P M 2 × P D ∣ M 1 ● P D ∣ M 2 Posterior odds ● Bayes factor: (very) sensitive to parameter range ● P D ∣ M 1 = ∫ p D ∣ p 0 d Occam's razor built in ● Example: P SM ∣ D P NP ∣ D = P SM P NP × 61.7 Six (NP) vs three (SM) parameters ● 23.08.2010 Frederik Beaujean #15
Recommend
More recommend