some statistical tools for particle physics
play

Some Statistical Tools for Particle Physics Particle Physics - PowerPoint PPT Presentation

Some Statistical Tools for Particle Physics Particle Physics Colloquium MPI fr Physik u. Astrophysik Munich, 10 May, 2016 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan MPI


  1. Some Statistical Tools for Particle Physics Particle Physics Colloquium MPI für Physik u. Astrophysik Munich, 10 May, 2016 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan MPI Seminar 2016 / Statistics for Particle Physics 1 G. Cowan

  2. Outline 1) Brief review of HEP context and statistical tests. 2) Statistical tests based on the profile likelihood ratio 3) A measure of discovery sensitivity is often used to plan a future analysis, e.g., s / √ b , gives approximate expected discovery significance (test of s = 0) when counting n ~ Poisson( s + b ). A measure of discovery significance is proposed that takes into account uncertainty in the background rate. 4) Brief comment on importing tools from Machine Learning & choice of variables for multivariate analysis MPI Seminar 2016 / Statistics for Particle Physics 2 G. Cowan

  3. Data analysis in particle physics Particle physics experiments are expensive e.g. LHC, ~ $10 10 (accelerator and experiments) the competition is intense (ATLAS vs. CMS) vs. many others and the stakes are high: 4 sigma effect 5 sigma effect Hence the increased interest in advanced statistical methods. MPI Seminar 2016 / Statistics for Particle Physics page 3 G. Cowan

  4. Prototypical HEP analyses Select events with properties characteristic of signal process (invariably select some background events as well). Case #1: Existence of signal process already well established (e.g. production of top quarks) Study properties of signal events (e.g., measure top quark mass, production cross section, decay properties,...) Statistics issues: Event selection → multivariate classifiers Parameter estimation (usually maximum likelihood or least squares) Bias, variance of estimators; goodness-of-fit Unfolding (deconvolution). MPI Seminar 2016 / Statistics for Particle Physics 4 G. Cowan

  5. Prototypical analyses (cont.): a “search” Case #2: Existence of signal process not yet established. Goal is to see if it exists by rejecting the background-only hypothesis. H 0 : All of the selected events are background (usually means “standard model” or events from known processes) H 1 : Selected events contain a mixture of background and signal. Statistics issues: Optimality (power) of statistical test. Rejection of H 0 usually based on p -value < 2.9 × 10 - 7 (5 σ ). Some recent interest in use of Bayes factors. In absence of discovery, exclusion limits on parameters of signal models (frequentist, Bayesian, “CLs”,...) MPI Seminar 2016 / Statistics for Particle Physics 5 G. Cowan

  6. (Frequentist) statistical tests Consider test of a parameter µ , e.g., proportional to cross section. Result of measurement is a set of numbers x. To define test of µ , specify critical region w µ , such that probability to find x ∈ w µ is not greater than α (the size or significance level ): (Must use inequality since x may be discrete, so there may not exist a subset of the data space with probability of exactly α .) Equivalently define a p -value p µ equal to the probability, assuming µ , to find data at least as “extreme” as the data observed. The critical region of a test of size α can be defined from the set of data outcomes with p µ < α . Often use, e.g., α = 0.05. If observe x ∈ w µ , reject µ . MPI Seminar 2016 / Statistics for Particle Physics 6 G. Cowan

  7. Test statistics and p -values Often construct a scalar test statistic, q µ ( x ), which reflects the level of agreement between the data and the hypothesized value µ . For examples of statistics based on the profile likelihood ratio, see, e.g., CCGV, EPJC 71 (2011) 1554; arXiv:1007.1727. Usually define q µ such that higher values represent increasing incompatibility with the data, so that the p -value of µ is: observed value of q µ pdf of q µ assuming µ Equivalent formulation of test: reject µ if p µ < α . MPI Seminar 2016 / Statistics for Particle Physics 7 G. Cowan

  8. Confidence interval from inversion of a test Carry out a test of size α for all values of µ . The values that are not rejected constitute a confidence interval for µ at confidence level CL = 1 – α . The confidence interval will by construction contain the true value of µ with probability of at least 1 – α . The interval depends on the choice of the critical region of the test. Put critical region where data are likely to be under assumption of the relevant alternative to the µ that’s being tested. Test µ = 0, alternative is µ > 0: test for discovery. Test µ = µ 0 , alternative is µ = 0: testing all µ 0 gives upper limit. MPI Seminar 2016 / Statistics for Particle Physics 8 G. Cowan

  9. p -value for discovery Large q 0 means increasing incompatibility between the data and hypothesis, therefore p -value for an observed q 0,obs is will get formula for this later From p -value get equivalent significance, MPI Seminar 2016 / Statistics for Particle Physics 9 G. Cowan

  10. Significance from p -value Often define significance Z as the number of standard deviations that a Gaussian variable would fluctuate in one direction to give the same p -value. 1 - TMath::Freq TMath::NormQuantile MPI Seminar 2016 / Statistics for Particle Physics 10 G. Cowan

  11. Prototype search analysis Search for signal in a region of phase space; result is histogram of some variable x giving numbers: Assume the n i are Poisson distributed with expectation values strength parameter where signal background MPI Seminar 2016 / Statistics for Particle Physics 11 G. Cowan

  12. Prototype analysis (II) Often also have a subsidiary measurement that constrains some of the background and/or shape parameters: Assume the m i are Poisson distributed with expectation values nuisance parameters ( θ s , θ b , b tot ) Likelihood function is MPI Seminar 2016 / Statistics for Particle Physics 12 G. Cowan

  13. The profile likelihood ratio Base significance test on the profile likelihood ratio: maximizes L for specified µ maximize L The likelihood ratio of point hypotheses, e.g., λ = L ( µ , θ )/ L (0, θ ), gives optimum test (Neyman-Pearson lemma). But the distribution of this statistic depends in general on the nuisance parameters θ , , and one can only reject µ if it is rejected for all θ . The advantage of using the profile likelihood ratio is that the asymptotic (large sample) distribution of - 2ln λ ( µ ) approaches a chi-square form independent of the nuisance parameters θ . MPI Seminar 2016 / Statistics for Particle Physics 13 G. Cowan

  14. Test statistic for discovery Try to reject background-only ( µ = 0) hypothesis using i.e. here only regard upward fluctuation of data as evidence against the background-only hypothesis. Note that even though here physically µ ≥ 0, we allow ˆ µ to be negative. In large sample limit its distribution becomes Gaussian, and this will allow us to write down simple expressions for distributions of our test statistics. MPI Seminar 2016 / Statistics for Particle Physics 14 G. Cowan

  15. Cowan, Cranmer, Gross, Vitells, arXiv:1007.1727, EPJC 71 (2011) 1554 Distribution of q 0 in large-sample limit Assuming approximations valid in the large sample (asymptotic) limit, we can write down the full distribution of q 0 as The special case µ ′ = 0 is a “half chi-square” distribution: In large sample limit, f ( q 0 |0) independent of nuisance parameters; f ( q 0 | µ ′ ) depends on nuisance parameters through σ . MPI Seminar 2016 / Statistics for Particle Physics 15 G. Cowan

  16. Cowan, Cranmer, Gross, Vitells, arXiv:1007.1727, EPJC 71 (2011) 1554 Cumulative distribution of q 0 , significance From the pdf, the cumulative distribution of q 0 is found to be The special case µ ′ = 0 is The p -value of the µ = 0 hypothesis is Therefore the discovery significance Z is simply MPI Seminar 2016 / Statistics for Particle Physics 16 G. Cowan

  17. Monte Carlo test of asymptotic formula Here take τ = 1. Asymptotic formula is good approximation to 5 σ level ( q 0 = 25) already for b ~ 20. MPI Seminar 2016 / Statistics for Particle Physics 17 G. Cowan

  18. Discovery: the p 0 plot The “local” p 0 means the p -value of the background-only hypothesis obtained from the test of µ = 0 at each individual m H , without any correct for the Look-Elsewhere Effect. The “Expected” (dashed) curve gives the median p 0 under assumption of the SM Higgs ( µ = 1) at each m H . ATLAS, Phys. Lett. B 716 (2012) 1-29 The blue band gives the width of the distribution (±1 σ ) of significances under assumption of the SM Higgs. MPI Seminar 2016 / Statistics for Particle Physics 18 G. Cowan

  19. Test statistic for upper limits cf. Cowan, Cranmer, Gross, Vitells, arXiv:1007.1727, EPJC 71 (2011) 1554. For purposes of setting an upper limit on µ use where I.e. when setting an upper limit, an upwards fluctuation of the data is not taken to mean incompatibility with the hypothesized µ : From observed q µ find p -value: Independent of Large sample nuisance param. in approximation: large sample limit 95% CL upper limit on µ is highest value for which p -value is not less than 0.05. MPI Seminar 2016 / Statistics for Particle Physics 19 G. Cowan

Recommend


More recommend