tools for physicists statistics
play

Tools for Physicists: Statistics Wolfgang Gradl Peter Weidenkaff - PowerPoint PPT Presentation

Tools for Physicists: Statistics Wolfgang Gradl Peter Weidenkaff Institut fr Kernphysik Summer semester 2019 The scientific method: how we create knowledge Tools for physicists: Statistics 2 | SoSe 2019 | Theory / model


  1. Variance, standard deviation Tools for physicists: Statistics | SoSe 2019 | 21 Variance of a distribution: � d xP ( x )( x − µ ) 2 = E [( x − µ ) 2 ] V ( x ) = Variance of a data sample V ( x ) = 1 ( x i − µ ) 2 = x 2 − µ 2 N ∑ i Requires knowledge of true mean µ . Replacing µ by sample mean ¯ x results in underestimated variance! Instead, use this: 1 ( x i − x ) 2 V ( x ) = ˆ N − 1 ∑ i Standard deviation: � V ( x ) σ =

  2. Multivariate distributions Tools for physicists: Statistics | SoSe 2019 | 22 Outcome of an experiment characterised by tuple ( x 1 , . . . , x n ) P ( A ∩ B ) = f ( x , y ) d x d y with f ( x , y ) the ‘joint pdf’ Normalisation � � f ( x 1 , . . . , x n ) d x 1 · · · d x n = 1 · · · Sometimes, only the pdf of one component is wanted: � � f 1 ( x 1 ) = f ( x 1 , . . . , x n ) d x 2 · · · d x n · · · ≈ projection of joint pdf onto individual axis

  3. Covariance and correlation Tools for physicists: Statistics | SoSe 2019 | 23 Covariance: cov [ x , y ] = E [( x − µ x )( y − µ y )] Correlation coefficient: ρ xy = cov [ x , y ] σ x σ y If x , y independent: � � E [( x − µ x )( y − µ y )] = ( x − µ x ) f x ( x ) d x ( y − µ y ) f y ( y ) d y = 0 Note: converse not necessarily true

  4. Covariance and correlation Tools for physicists: Statistics | SoSe 2019 | 24

  5. Linear combinations of random variables Tools for physicists: Statistics | SoSe 2019 | 25 Consider two random variables x and y with known covariance cov [ x , y ] � x + y � = � x � + � y � � ax � = a � x � V [ ax ] = a 2 V [ x ] V [ x + y ] = V [ x ] + V [ y ] + 2 cov [ x , y ] For uncorrelated variables, simply add variances. How about combination of N independent measurements (estimates) of a quantity, x i ± σ , all drawn from the same underlying distribution? x = 1 N ∑ x i best estimate ¯ x ] = N 2 σ V [ N ¯ 1 σ ¯ x = √ σ N

  6. Combination of measurements: weighted mean Tools for physicists: Statistics | SoSe 2019 | 26 Suppose we have N independent measurements of the same quantity, but each with a different uncertainty: x i ± δ i Weighted sum: x = w 1 x 1 + w 2 x 2 δ 2 = w 2 1 δ 2 1 + w 2 2 δ 2 2 Determine weights w 1 , w 2 under constraint w 1 + w 2 = 1 such that δ 2 is minimised: 1 / δ 2 i w i = 1 / δ 2 1 + 1 / δ 2 2 If original raw data of the two measurements are available, can improve this estimate by combining raw data alternatively, use log-likelihood curves to combine measurements

  7. 27 Tools for physicists: Statistics | SoSe 2019 | Correlation � = causation Correlation coefficient: 0.791 significant correlation (p < 0.0001) 0.4 kg/year/capita to produce one additional Nobel laureate improved cognitive function associated with regular intake of dietary flavonoids? F. Messerli, N Engl J Med 2012; 367:1562

  8. Tools for physicists: Statistics | SoSe 2019 | 28 Some important distributions

  9. Gaussian Tools for physicists: Statistics | SoSe 2019 | 29 A.k.a. normal distribution 1.0 μ = σ = 0, 2 0.2, μ = σ = 0, 2 1.0, 0.8 μ = σ = 0, 2 5.0, � � μ = σ = 2 − 2, 0.5, − ( x − µ ) 2 1 0.6 x ) g ( x ; µ , σ ) = √ exp φ μ , σ 2 ( 2 σ 2 2 πσ 0.4 Mean: E [ x ] = µ 0.2 Variance: V [ x ] = σ 2 0.0 − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 x Standard normal distribution: µ = 0 , σ = 1 Cumulative distribution related to error function � x x � � � � 1 2 d z = 1 e − z 2 Φ ( x ) = + 1 √ √ erf 2 2 π 2 − ∞ In Python: scipy.stats.norm(loc, scale)

  10. 30 Tools for physicists: Statistics | SoSe 2019 | p -value Probability for a Gaussian distribution corresponding to [ µ − Z σ , µ + Z σ ] : � Z � � + Z 1 − Z e − x 2 P ( Z σ ) = 2 = Φ ( Z ) − Φ ( − Z ) = erf √ √ 2 π 2 68 . 27 % of area within ± 1 σ 90 % of area within ± 1 . 645 σ 95 . 45 % of area within ± 2 σ 95 % of area within ± 1 . 960 σ 99 . 73 % of area within ± 3 σ 99 % of area within ± 2 . 576 σ p -value: Deviation p -value (%) probability that random process 1 σ 31 . 73 (fluctuation) produces a measurement 2 σ 4 . 55 at least this far from the true mean 3 σ 0 . 270 4 σ 0 . 006 33 p -value : = 1 − P ( Z σ ) 5 σ 0 . 000 057 3 Available in ROOT: TMath::Prob(Z*Z) and Python: 2*stats.norm.sf(Z)

  11. Why are Gaussians so useful? Tools for physicists: Statistics | SoSe 2019 | 31 Central limit theorem: sum of n random variables approaches Gaussian distribution, for large n True, if fluctuation of sum is not dominated by the fluctuation of one (or a few) terms Good example: velocity component v x of air molecules So-so example: total deflection due to multiple Coulomb scattering. Rare large angle deflections give non-Gaussian tail Bad example: energy loss of charged particles traversing thin gas layer. Rare collisions make up large fraction of energy loss ➡ Landau PDF See practical part of today’s lecture

  12. Binomial distribution Tools for physicists: Statistics | SoSe 2019 | 32 N independent experiments Outcome of each is either ‘success’ or ’failure’ Probability for success is p � N � p k ( 1 − p ) N − k f ( k ; N , p ) = E [ k ] = Np V [ k ] = Np ( 1 − p ) k � N � binomial coefficient: number of permutations to have k successes N ! = in N tries k k ! ( N − k ) ! Use binomial distribution to model processes with two outcomes Example: detection efficiency = #(particles seen) / #(all particles) In the limit N → ∞ , p → 0 , Np = ν = const, binomial distribution can be approximated by a Poisson distribution

  13. Poisson distribution Tools for physicists: Statistics | SoSe 2019 | 33 p ( k ; ν ) = ν k k ! e − ν E [ k ] = ν ; V [ k ] = ν Properties: If n 1 , n 2 follow Poisson distribution, then also n 1 + n 2 Can be approximated by Gaussian for large ν Number of Prussian cavalrymen Examples: killed by horse-kicks Clicks of a Geiger counter in a given time interval Cars arriving at a traffic light in one minute

  14. Uniform distribution Tools for physicists: Statistics | SoSe 2019 | 34   1 a ≤ x ≤ b b − a f ( x ; a , b ) =  0 otherwise Properties: E [ x ] = 1 2 ( a + b ) V [ x ] = 1 12 ( a + b ) 2 Example: Strip detector: resolution for one-strip clusters: √ pitch / 12

  15. Exponential distribution Tools for physicists: Statistics | SoSe 2019 | 35   1 ξ e − x / ξ x ≤ 0 f ( x ; ξ ) =  0 otherwise V [ k ] = ξ 2 E [ k ] = ξ ; Example: Decay time of an unstable particle at rest f ( t ; τ ) = 1 τ e − t / τ τ = mean lifetime Lack of memory (unique to exponential): f ( t − t 0 | t ≥ t 0 ) = f ( t ) Probability for an unstable nucleus to decay in the next minute is independent of whether the nucleus was just created or has already existed for a million years.

  16. 36 Tools for physicists: Statistics | SoSe 2019 | χ 2 distribution Let x 1 , . . . , x n be n independent standard normal ( µ = 0 , σ = 1) random variables. Then the sum of their squares n ( x ′ − µ ′ ) 2 x 2 i = ∑ z = ∑ σ ′ 2 i = 1 i follows a χ 2 distribution with n degrees of freedom. z n / 2 − 1 2 ) e − z / 2 , f ( z ; n ) = z ≥ 0 2 n / 2 Γ ( n E [ z ] = n , V [ z ] = 2 n Quantify goodness of fit, compatibility of measurements, …

  17. 37 Tools for physicists: Statistics | SoSe 2019 | Student’s t distribution Let x 1 , . . . , x n be distributed as N ( µ , σ ) . x = 1 1 Sample mean and σ 2 = x ) 2 n ∑ x i , n − 1 ∑ ( x i − ¯ ¯ ˆ estimate of variance: i i Don’t know true µ , therefore have to estimate variance by ˆ σ . x − µ x − µ ¯ ¯ σ / √ n follows N ( 0 , 1 ) σ / √ n not Gaussian. ˆ Student’s t-distribution with n − 1 � � − n + 1 Γ ( n + 1 1 + t 2 2 d.o.f. 2 ) f ( t ; n ) = √ n π Γ ( n n 2 ) For n → ∞ , f ( t ; n ) → N ( t ; 0 , 1 ) Applications: Hypothesis tests: assess statistical significance between two sample means Set confidence intervals (more of that later)

  18. Landau distribution Tools for physicists: Statistics | SoSe 2019 | 38 Describes energy loss of a (heavy) charged particle in a thin layer of material due to ionisation tail with large energy loss due to occasional high-energy scattering, e.g. creation of delta rays � ∞ f ( λ ) = 1 exp ( − u ln u − λ u ) sin ( π u ) d u π 0 λ = ∆ − ∆ 0 ξ ∆ : actual energy loss ∆ 0 : location parameter ξ : material property Unpleasant: mean and variance (all moments, really) are not defined

  19. Delta rays Tools for physicists: Statistics | SoSe 2019 | 39 Julien SIMON, CC-BY-SA 3.0

  20. Parameter estimation Tools for physicists: Statistics | SoSe 2019 | 40 Parameters of a pdf are constants that ) τ 0.1 Projection of exp(-t/ characterise its shape, e.g. 0.08 f ( x ; θ ) = 1 θ e − x / θ 0.06 0.04 x : random variable θ : parameter 0.02 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 time Suppose we have a sample of observed values, � x = ( x 1 , . . . , x n ) , independent, identically distributed (i.i.d.). Want to find some function of the data to estimate the parameters: ˆ x ) θ ( � Often, θ is also a vector of parameters.

  21. Properties of estimators Tools for physicists: Statistics | SoSe 2019 | 41 Consistency Estimator is consistent if it converges to the true value ˆ lim θ = θ n → ∞ Bias Difference between expectation value of Example: estimators for lifetime of a particle estimator and true value b ≡ E [ ˆ θ ] − θ Effjciency Estimator is efficient if its variance V [ ˆ θ ] is small

  22. Unbiased estimators for mean and variance Tools for physicists: Statistics | SoSe 2019 | 42 Estimator for the mean: n x = 1 x i ∑ µ = ¯ ˆ n i = 1 µ ] = σ 2 b = E [ ˆ µ ] − µ = 0; V [ ˆ n , i.e. σ ˆ σ µ = √ n Estimator for the variance: n 1 s 2 = ˆ σ 2 = x ) 2 ∑ ( x i − ¯ n − 1 i = 1 b = E [ s 2 ] − σ 2 = 0 � � V [ s 2 ] = σ 4 2 κ = µ 4 / σ 4 : kurtosis. ( κ − 1 ) + n n − 1 Note : even though s 2 is unbiased estimator for variance σ 2 , s is a biased estimator for s.d. σ

  23. Likelihood function for i.i.d. data Tools for physicists: Statistics | SoSe 2019 | 43 Suppose we have a measurement of n independent values (i.i.d.) x = ( x 1 , . . . , x n ) � drawn from the same distribution f ( x ; � � θ ) , θ = ( θ 1 , . . . , θ m ) The joint pdf for the observed values � x is given by n x ; � f ( x i ; � likelihood function ∏ L ( � θ ) = θ ) i = 1 Consider � x as constant. The maximum likelihood estimate (MLE) of the parameters are the values � θ for which L ( � x ; � θ ) has a global maximum. For practical reasons, usually use log L (computers can cope with sum of small numbers much better than with product of small numbers)

  24. ML Example: Exponential decay Tools for physicists: Statistics | SoSe 2019 | 44 Consider exponential pdf: f ( t ; τ ) = 1 τ e − t / τ Independent measurements drawn from this distribution: t 1 , t 2 , . . . , t n Likelihood function: 1 τ e − t i / τ L ( τ ) = ∏ i L ( τ ) is maximal where log L ( τ ) is maximal: � � n n log 1 τ − t i log f ( t i ; τ ) = ∑ ∑ log L ( τ ) = τ i = 1 i = 1 Find maximum: � � n − 1 τ + t i τ = 1 ∂ log L ( τ ) = 0 ∑ = 0 n ∑ t i ⇒ ⇒ ˆ τ 2 ∂τ i = 1 i

  25. ML Example: Gaussian Tools for physicists: Statistics | SoSe 2019 | 45 Consider x 1 , . . . , x n drawn from Gaussian ( µ , σ 2 ) 1 e − ( x − µ ) 2 f ( x ; µ , σ 2 ) = √ 2 σ 2 2 πσ Log-likelihood function: � � − log σ − ( x i − µ ) 2 1 log L ( µ , σ 2 ) = ∑ log f ( x i ; µ , σ 2 ) = ∑ √ log 2 σ 2 2 π i i Derivatives w.r.t µ and σ 2 : � � ∂ log L ( µ , σ 2 ) ∂ log L ( µ , σ 2 ) ( x i − µ ) 2 x i − µ 1 = ∑ = ∑ ; − σ 2 ∂σ 2 2 σ 4 2 σ 2 ∂µ i i

  26. ML Example: Gaussian Tools for physicists: Statistics | SoSe 2019 | 46 Setting derivatives w.r.t. µ and σ 2 to zero, and solving the equations: µ = 1 σ 2 = 1 � µ ) 2 x i ; ( x i − ˆ n ∑ n ∑ ˆ i i Find that the ML estimator for σ 2 is biased!

  27. Tools for physicists: Statistics Properties of the ML estimator | SoSe 2019 | 47 ML estimator is consistent, i.e. it approaches the true value asymptotically In general, ML estimator is biased for finite n (need to check this) ML estimator is invariant under parameter transformation ψ = g ( ˆ ψ = g ( θ ) ˆ ⇒ θ )

  28. Averaging measurements with Gaussian uncertainties Tools for physicists: Statistics | SoSe 2019 | 48 Assume n measurements, same mean µ , but different resolutions σ − ( x − µ ) 2 1 2 σ 2 f ( x ; µ , σ i ) = e √ i 2 πσ i log-likelihood, similar to before: � � − log σ i − ( x i − µ ) 2 1 log L ( µ ) = ∑ log √ 2 σ 2 2 π i i We obtain formula for weighted average, as before: � x i ∑ i � ∂ log L ( µ ) σ 2 ! � = 0 i ⇒ µ = ˆ � 1 ∂µ ∑ i µ = ˆ µ σ 2 i

  29. Averaging measurements with Gaussian uncertainties Tools for physicists: Statistics | SoSe 2019 | 49 Uncertainty? Taylor expansion exact, because log L ( µ ) is parabola: � � ∂ log L � � h = − ∂ 2 log L ( µ ) − h � µ ) 2 , log L ( µ ) = log L ( ˆ µ ) + ( µ − ˆ µ ) 2 ( µ − ˆ � ∂µ 2 � ∂µ µ = ˆ µ µ = ˆ µ � �� � = 0 This means that likelihood function is a Gaussian: � � − h µ ) 2 L ( µ ) ∝ exp 2 ( µ − ˆ with a standard deviation   � − 1 � √  ∂ 2 log L ( µ ) � µ = 1 / h =  σ ˆ � ∂µ 2 � µ = ˆ µ � � − 1 / 2 1 1 h = ∑ ∑ ⇒ σ ˆ µ = σ 2 σ 2 i i i i

  30. Uncertainty bounds Tools for physicists: Statistics | SoSe 2019 | 50 Likelihood function with only one parameter: n x ; θ ) = L ( x 1 , . . . , x n ; θ ) = f ( x i ; θ ) ∏ L ( � i = 1 and ˆ θ an estimator of the parameter θ Without proof: it can be shown that the variance of a (biased, with bias b ) estimator satisfies ( 1 + ∂ b ∂θ ) 2 V [ ˆ θ ] ≥ � � − ∂ 2 log L E ∂θ 2 Cramér-Rao minimum variance bound (MVB)

  31. Uncertainty of the MLE: Approach I Tools for physicists: Statistics | SoSe 2019 | 51 Approximation � � � � − ∂ 2 log L ≈ − ∂ 2 log L � E � ∂θ 2 ∂θ 2 � θ = ˆ θ good for large n (and away from any explicit boundaries on θ ) In this approximation, variance of ML estimator is given by � � � − 1 � ∂ 2 log L � V [ ˆ θ ] = − � ∂θ 2 � θ = ˆ θ so we only need to evaluate the second derivative of log L at its maximum.

  32. Uncertainty of the MLE: Approach II (‘graphical method’) | SoSe 2019 | Tools for physicists: Statistics 52 Taylor expansion of log L around maximum: � � � ∂ log L � ∂ 2 log L + 1 θ ) 2 + · · · log L ( θ ) = log L ( ˆ ( θ − ˆ ( θ − ˆ θ ) + θ ) 2 ∂θ 2 ∂θ θ = ˆ θ θ = ˆ � �� � θ = 0 If L approximately Gaussian ( log L approx. a parabola): θ ) 2 log L ( θ ) ≈ log L max − ( θ − ˆ 2 � σ 2 ˆ θ Estimate uncertainties from the points where log L has dropped by 1/2 from its maximum: θ ) ≈ log L max − 1 log L ( ˆ θ ± ˆ σ ˆ 2 This can be used even if L ( θ ) is not Gaussian If L ( θ ) is Gaussian: results of approach I & II identical

  33. Example: uncertainty of the decay time for an exponential decay | SoSe 2019 | Tools for physicists: Statistics 53 Variance of the estimated decay time: � 1 � � � ∂ 2 log L ( τ ) τ 2 − 2 t i = n τ 2 − 2 t i = n 1 − 2 ˆ τ = ∑ τ 3 ∑ ∂τ 2 τ 3 τ 2 τ i i Thus, � � − 1 ∂ 2 log L ( τ ) τ 2 = ˆ V [ ˆ τ ] = − ∂τ 2 n τ = ˆ τ τ ˆ ⇒ σ ˆ ˆ τ = √ n

  34. Exponential decay: illustration Tools for physicists: Statistics | SoSe 2019 | 54 τ e − t / τ with τ = 2 20 data points sampled from f ( t ; τ ) = 1 − 6 28 Events / ( 0.1 ) Projection of log L ( / s ) − 28.5 5 − 29 4 − 29.5 − 3 30 − 30.5 2 − 31 1 − 31.5 − 0 32 0 1 2 3 4 5 6 7 8 9 10 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 τ (s) time ML estimate: τ = 1 . 65 ˆ √ σ = 1 . 65 / 20 = 0 . 37 using quadratic approximation of L ( τ ) ˆ Or, using shape of log L curve, and ’ log L − 1 / 2’ prescription: τ = 1 . 65 + 0 . 47 asymmetric errors, ˆ − 0 . 34

  35. 55 Tools for physicists: Statistics | SoSe 2019 | Exponential decay: log L for different sample sizes 10 data points 500 data points − − 12 836 Projection of log L ( / s ) Projection of log L ( / s ) − − 12.5 836.5 − − 13 837 − − 13.5 837.5 − − 14 838 − − 14.5 838.5 − − 15 839 − − 15.5 839.5 − − 16 840 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 τ τ (s) (s) quadratic approximation for log L quadratic approximation for log L not very good excellent

  36. Minimum Variance Bound for m parameters Tools for physicists: Statistics | SoSe 2019 | 56 f ( x ; � � θ ) ; θ = ( θ 1 , . . . , θ m ) Minimum variance bound related to Fisher information matrix: � � � � ∂ 2 log f ( x i ; � ∂ 2 log L ( � θ ) θ ) θ ] − 1 ) jj ; V [ ˆ θ j ] ≥ ( I [ � I jk [ � θ ] = − E = − E ∑ ∂θ j ∂θ k ∂θ j ∂θ k i

  37. Variance of the ML estimator for m parameters Tools for physicists: Statistics | SoSe 2019 | 57 In limit of large sample size, L approaches multivariate Gaussian distribution for any probability density : � � − 1 θ − ˆ θ ) T V − 1 [ ˆ θ − ˆ L ( � 2 ( � � � θ ]( � � θ ) ∝ exp θ ) Variance of ML estimator reaches MVB (minimum variance bound), related to the Fisher information matrix: � � ∂ 2 log L ( � θ ) V [ ˆ θ ] → I ( θ ) − 1 , � I jk [ � θ ] = − E ∂θ j ∂θ k Covariance matrix of the estimated parameters: � � � − 1 � − ∂ 2 log L ( � x ; � θ ) � V [ ˆ � θ ] ≈ � ∂ � θ 2 � � θ = ˆ � θ Standard deviation of a single parameter: � ( V [ ˆ � ˆ θ j = θ ]) jj σ ˆ

  38. MLE in practice: numeric minimisation Tools for physicists: Statistics | SoSe 2019 | 58 Analytic expression for L ( θ ) and its derivatives often not easily known Use a generic minimiser like MINUIT to find (global) minimum of − log L ( θ ) — Typically uses gradient descent method to find minimum and then scans around minimum to obtain L − 1 / 2 contour (make sure you don’t get stuck in a local minimum) ➡ see today’s practical part for a hands-on

  39. Tools for physicists: Statistics Bounds on parameters in MINUIT | SoSe 2019 | 59 Sometimes, you may want to bound the allowed range of fit parameters e.g. to prevent (numerical) instabilities or unphysical results (‘fraction f should be in [ 0 , 1 ] ’, ‘mass ≥ 0’) MINUIT internally transforms bounded parameter y with an arcsin ( y ) function to an unbounded parameter x :

  40. Bounds on parameters in MINUIT Tools for physicists: Statistics | SoSe 2019 | 60 If fitted parameter value is close to boundary, errors will become asymmetric and maybe even incorrect: Try to find alternative parametrisation to avoid region of instability. E.g. complex number z = re i φ with bounds r ≥ 0, 0 ≤ φ < 2 π z = x + i y may be better behaved If bounds were placed to avoid ‘unphysical’ region, consider not imposing the limits and dealing with the restriction to the physical region after the fit.

  41. Extended ML method Tools for physicists: Statistics | SoSe 2019 | 61 In standard ML method, information about unknown parameters is encoded in shape of the distribution of the data. Sometimes, the number of observed events also contains information about the parameters (e.g. when measuring a decay rate). Normal ML method: � f ( x ; � θ ) d x = 1 Extended ML method: � q ( x ; � θ ) d x = ν ( � θ ) = predicted number of events

  42. Extended ML method (II) Tools for physicists: Statistics | SoSe 2019 | 62 Likelihood function becomes: θ ) = ν e − ν L ( � f ( x i ; � where ν ≡ ν ( � n ! ∏ θ ) θ ) i And log-likelihood function: θ ) + ∑ log L ( � θ ) = − log ( n ! ) − ν ( � log [ f ( x i ; � θ ) ν ( � θ )] i log n ! does not depend on parameters. Can be omitted in minimisation

  43. Application of Extended ML method Tools for physicists: Statistics | SoSe 2019 | 63 Example: Events / ( 0.1 ) 60 Two-component fit (signal + background) Unbinned ML fit, histogram for 50 visualisation only 40 Want to obtain meaningful estimate of the uncertainties of signal and background 30 yields 20 10 0 0 1 2 3 4 5 6 7 8 9 10 x Normalised pdf: s b f ( x ; r s , � θ ) = r s f s ( x ; � θ ) + ( 1 − r s ) f b ( x ; � s + b , r b = 1 − r s = r s = θ ) , s + b θ ) = s + b − ∑ − log ˜ L ( s , b , � log [ sf s ( x i ; � θ ) + bf b ( x i ; � θ )] i

  44. Tools for physicists: Statistics Application of Extended ML method (II) | SoSe 2019 | 64 Could have just fitted normalised pdf, with r s an additional parameter Good estimate of the number of signal events: r s n However, σ r s n is not a good estimate for the variation of the number of signal events: ignores fluctuations of n . Using extended ML fixes this.

  45. Least squares from ML Tools for physicists: Statistics | SoSe 2019 | 65 Consider n measured values 6 y y 1 ( x 1 ) , y 2 ( x 2 ) , . . . , y n ( x n ) , assumed to be independent 5 Gaussian r.v. with known 4 variances, V [ y i ] = σ 2 i . 3 Assume we have a model for 2 the functional dependence of y i 1 on x i , 0 0 1 2 3 4 5 6 E [ y i ] = f ( x i ; � θ ) x Want to estimate � θ Likelihood function:  � 2  � 1  − 1 y i − f ( x i ; � θ ) θ ) = ∏ L ( �  √ exp 2 2 πσ i σ i i

  46. Least squares from ML (II) Tools for physicists: Statistics | SoSe 2019 | 66 Log-likelihood function: � � 2 θ ) = − 1 y i − f ( x i ; � θ ) log L ( � + terms not depending on � 2 ∑ θ σ i i Maximising this is equivalent to minimising � � 2 y i − f ( x i ; � θ ) χ 2 ( � θ ) = ∑ σ i i so, for Gaussian uncertainties, method of least squares coincides with maximum likelihood method. min + Z 2 for a Z σ interval Error definition: points where χ 2 = χ 2 2 Z 2 for MLE) (compare: log L = log L max − 1

  47. Linear least squares Tools for physicists: Statistics | SoSe 2019 | 67 Important special case: consider function linear in the parameters: θ ) = ∑ f ( x ; � a j ( x ) θ j n data points, m parameters j χ 2 in matrix form: χ 2 = ( � θ ) T V − 1 ( � y − A � y − A � A i , j = a j ( x i ) θ ) , y T V − 1 � y T V − 1 A � θ T A T V − 1 A � y − 2 θ + � = � � θ Set derivatives w.r.t. θ i to zero: ∇ χ 2 = − 2 ( A T V − 1 � y − A T V − 1 A � θ ) = 0 Solution: � θ = ( A T V − 1 A ) − 1 A T V − 1 � � y ≡ L y �

  48. Linear least squares Tools for physicists: Statistics | SoSe 2019 | 68 Covariance matrix U of the parameters, from error propagation (exact, because estimated parameter vector is linear function of data points y i ) U = LVL T = ( A T V − 1 A ) − 1 Equivalently, calculate numerically � � ∂ 2 χ 2 ( U − 1 ) ij = 1 2 ∂θ i ∂θ j θ = � � � θ

  49. Example: straight line fit Tools for physicists: Statistics | SoSe 2019 | 69 y = θ 0 + θ 1 x Conditions ∂χ 2 / ∂θ 0 = 0 and ∂χ 2 / ∂θ 1 = 0 yield two linear equations with two variables that are easy to solve. With the shorthand notation z [ z ] : = ∑ σ 2 i i we finally obtain θ 0 = [ x 2 ][ y ] − [ x ][ xy ] θ 1 = − [ x ][ y ] + [ 1 ][ xy ] ˆ ˆ [ 1 ][ x 2 ] − [ x ][ x ] , [ 1 ][ x 2 ] − [ x ][ x ] Simple, huh? At least, easy to program and compute, given a set of data (I’ll put the complete calculation for this in the appendix of the slides)

  50. Example: straight line fit Tools for physicists: Statistics | SoSe 2019 | 70 Analytic fit result: 6 y θ 0 = [ x 2 ][ y ] − [ x ][ xy ] 5 ˆ [ 1 ][ x 2 ] − [ x ][ x ] = 1 . 16207 4 θ 1 = − [ x ][ y ] + [ 1 ][ xy ] ˆ = 0 . 613945 3 [ 1 ][ x 2 ] − [ x ][ x ] 2 1 Covariance matrix of ( θ 0 , θ 1 ) : 0 0 1 2 3 4 5 6 U = ( A T V − 1 A ) − 1 x � � Data: 0 . 211186 − 0 . 0646035 = x y − 0 . 0646035 0 . 0234105 σ y 1 1 . 7 0 . 5 2 2 . 3 0 . 3 3 3 . 5 0 . 4 4 3 . 3 0 . 4 5 4 . 3 0 . 6

  51. Example: straight line fit Tools for physicists: Statistics | SoSe 2019 | 71 Numerical estimate with MINUIT: 6 y 5 4 3 2 1 0 0 1 2 3 4 5 6 x Data: x y σ y 1 1 . 7 0 . 5 2 2 . 3 0 . 3 3 3 . 5 0 . 4 4 3 . 3 0 . 4 5 4 . 3 0 . 6

  52. Fitting binned data Tools for physicists: Statistics | SoSe 2019 | 72 Very popular application of least-squares fit: fit a model (curve) to binned data (a histogram) Number of events occurring in each bin j is assumed to follow Poisson distribution with mean f j . m n j − f j χ 2 = ∑ f j j = 1 Further common simplification: ‘modified least-squares method’, assuming that σ 2 n j = n j : m n j − f j χ 2 ≈ ∑ n j j = 1 Can get away with this when all n j are sufficiently large, but what about bins with small contents, or even zero events? ➡ Frequently, bins with n j = 0 are simply excluded. This throws away information, and will lead to biased results of your fit!

  53. Fitting binned data Tools for physicists: Statistics | SoSe 2019 | 73 https://www.phas.ubc.ca/~oser/p509/Lec_09.pdf Example: exponential distribution, 100 events red: true distribution black: fit The more bins you have with small statistics, the worse the MLS fit becomes. ML method gives more reliable results in this case. If you must use MLS, then at least rebin your data, at the loss of information. Oser,

  54. Discussion of fit methods Tools for physicists: Statistics | SoSe 2019 | 74 Unbinned maximum likelihood fit + no need to bin data (make full use of information in data) + works naturally with multi-dimensional data + no Gaussian assumption + works with small statistics - no direct goodness-of-fit estimate - can be computationally expensive, especially with high statistics - visualisation of data and fit needs a bit of thought Least squares fit + fast, robust, easy + goodness of fit ‘free of charge’ + can plot fit with data easily + works fine at high statistics (computationally cheap) - assumes Gaussian/Poissonian errors (this breaks down if bin content too small) - suffers from curse of dimensionality - blind for features smaller than bin size

  55. Practical estimation — verifying the validity of your fits Tools for physicists: Statistics | SoSe 2019 | 75 Want to demonstrate that your fit procedure gives, on average, the correct answer: no bias uncertainty quoted by your fit is an accurate measure for the statistical spread in your measurement: correct error Validation is particularly important for low-statistics fits intrinsic ML bias proportional 1 / n Also important for problems with multi-dimensional observables: mis-modelled correlations between observables can lead to bias

  56. Tools for physicists: Statistics Basic validation strategy | SoSe 2019 | 76 Simulation study 1. Obtain (very) large sample of simulated events 2. Divide simulated events in O ( 100 − 1000 ) independent samples with the same size as the problem under study 3. Repeat fit procedure for each data-sized simulated sample 4. Compare average value of fitted parameter values with generated value ➠ demonstrate (absence of) bias 5. Compare spread in fitted parameter values with quoted parameter error ➠ demonstrate (in)correctness of error

  57. Practical example — validation study Tools for physicists: Statistics | SoSe 2019 | 77 Example fit model in 1D ( B mass) Events / ( 0.001 GeV ) 40 35 signal component is Gaussian 30 centred at B mass 25 20 background component is ARGUS 15 function (models phase space near 10 kinematic limit) 5 0 5.2 5.21 5.22 5.23 5.24 5.25 5.26 5.27 5.28 5.29 5.3 m (GeV) ES q ( m ; n sig , n bkg , � p sig , � p bkg ) = n sig G ( m ; � p sig ) + n bkg A ( m ; � p bkg ) Fit parameter under study: n sig Events / ( 3.5 ) 90 result of simulation study: 80 70 1000 experiments � � � � 60 n gen n gen with = 200 , = 800 50 sig bkg 40 distribution of n fit 30 sig 20 …looks good 10 0 140 160 180 200 220 240 260 #signal events

  58. Validation study — pull distribution Tools for physicists: Statistics | SoSe 2019 | 78 What about validity of the error Events / ( 0.352973 ) 160 estimate? 140 120 distribution of error from simulated 100 experiments is difficult to interpret 80 60 … 40 don’t have equivalent of n gen sig for 20 0 16 18 20 22 24 26 28 the error #signal events Error Solution: look at pull distribution Definition: Events / ( 0.25 ) ± pullMean = -0.0246 0.030 sig − n gen 120 n fit ± pullSigma = 0.954 0.021 sig pull ( n sig ) ≡ 100 σ fit n 80 60 Properties of pull: 40 ◮ Mean is 0 if no bias ◮ Width is 1 if error is correct 20 0 − − − − − 5 4 3 2 1 0 1 2 3 4 5 #signal events Pull

  59. Validation study — extended ML! Tools for physicists: Statistics | SoSe 2019 | 79 As an aside, ran this toy study also with standard (not extended) ML method: Extended Standard Events / ( 3.5 ) Events / ( 3.5 ) 90 80 120 70 100 60 80 50 40 60 30 40 20 20 10 0 0 140 160 180 200 220 240 260 140 160 180 200 220 240 260 #signal events #signal events Events / ( 0.25 ) ± Events / ( 0.25 ) pullMean = -0.0246 0.030 ± 1000 pullMean = -0.00174 0.0032 120 ± ± pullSigma = 0.954 0.021 pullSigma = 0.100000 0.000051 100 800 80 600 60 400 40 200 20 0 0 − − − − − − − − − − 5 4 3 2 1 0 1 2 3 4 5 5 4 3 2 1 0 1 2 3 4 5 #signal events Pull #signal events Pull σ ( pull ) = 0 . 954 ± 0 . 021 σ ( pull ) = 0 . 001

  60. Validation study — low statistics example Tools for physicists: Statistics | SoSe 2019 | 80 Special care needs to be taken when fitting small data samples, also if fitting small signal component in large sample Possible causes of trouble χ 2 estimators become approximate as Gaussian approximation of Poisson statistics becomes inaccurate ML estimators may no longer be efficient error estimate from 2 nd derivative inaccurate Bias term ∝ 1 / n may no longer be small compared to 1 / √ n In general, absence of bias, correctness of error cannot be assumed. Use unbinned ML fits wherever possible — more robust explicitly verify the validity of your fit

  61. Fit bias at low n Tools for physicists: Statistics | SoSe 2019 | 81 Low statistics example: Events / ( 0.001 GeV ) 22 model as before, but with 20 � � 18 n gen = 20 16 sig 14 12 10 8 6 4 Result of simulation study: 2 0 5.2 5.21 5.22 5.23 5.24 5.25 5.26 5.27 5.28 5.29 5.3 m (GeV) ES Events / ( 2.75 ) Events / ( 0.668982 ) Events / ( 0.25 ) ± 140 140 120 pullMean = 0.096 0.032 ± pullSigma = 1.023 0.023 120 120 100 100 100 80 80 80 60 60 60 40 40 40 20 20 20 0 0 0 − − − − − 0 20 40 60 80 100 5 10 15 20 25 5 4 3 2 1 0 1 2 3 4 5 #signal events #signal events Error #signal events Pull Distributions become asymmetric at low statistics fit is positively biased

  62. 82 Tools for physicists: Statistics | SoSe 2019 | Validation study — how to obtain 10 7 simulated events? Practical issue: usually need very large amounts of simulated events for a fit validation study Of order 1000x (number of events in data), easily > 10 6 events Using data generated through full (GEANT-based) detector simulation can be prohibitively expensive Solution: sample events directly from fit function Technique called toy Monte Carlo sampling Advantage: easy to do, very fast Good to determine fit bias due to low statistics, choice of parametrisation, bounds on parameters, … Cannot test assumptions built in to fit model: absence of correlations between observables, … still need full simulation for this

  63. Summary of today’s lecture Tools for physicists: Statistics | SoSe 2019 | 83 Powerful tool to estimate parameters of distributions: Maximum likelihood method In the limit of large statistics, least squares method is equivalent to MLE Linear least squares: analytical solution! How to decide whether model is appropriate in the first place: next week! goodness-of-fit, hypothesis testing, … Whatever you use, validate your fit: demonstrate absence of bias, correctness of error estimate

  64. next week: how can we choose the ’best’ fit model?

  65. Addendum: Linear least squares (I) Tools for physicists: Statistics | SoSe 2019 | 85 Fit model: y = θ 1 x + θ 0 Apply general solution developed for linear least squares fit: A i , j = a j ( x i ) ˆ L = ( A T V − 1 A ) − 1 A T V − 1 , � θ = L y �   1 / σ 2 1 � �   1 / σ 2   1 1 1 2 · · ·   A T = V − 1 = ;  ...  x 1 x 2 x n   · · ·   1 / σ 2 n � � 1 / σ 2 1 / σ 2 1 / σ 2 · · · A T V − 1 = n 1 2 x 1 / σ 2 x 2 / σ 2 x n / σ 2 · · · 1 2 n   1 x 1 � �   � � 1 x 2   1 / σ 2 1 / σ 2 1 / σ 2 ∑ i 1 / σ 2 ∑ i x i / σ 2 · · ·   A T V − 1 A = n 1 2 i i  . .  = x 1 / σ 2 x 2 / σ 2 x n / σ 2 ∑ i x i / σ 2 ∑ i x 2 i / σ 2 . .   · · · 1 2 n . . i i   1 x n

  66. Addendum: Linear least squares (II) Tools for physicists: Statistics | SoSe 2019 | 86 2 × 2 matrix easy to invert. Using shorthand notation [ z ] = ∑ i z / σ 2 i : � � [ x 2 ] 1 − [ x ] ( A T V − 1 A ) − 1 = [ 1 ][ x 2 ] − [ x ][ x ] − [ x ] [ 1 ] And therefore L = ( A T V − 1 A ) − 1 A T V − 1 � � � � [ x 2 ] 1 / σ 2 1 / σ 2 1 / σ 2 1 − [ x ] · · · 1 2 n = · [ 1 ][ x 2 ] − [ x ][ x ] x 1 / σ 2 x 2 / σ 2 x n / σ 2 [ 1 ] − [ x ] · · · 1 2 n   [ x 2 ] [ x 2 ] 1 − [ x ] x 1 n − [ x ] x n · · · 1 σ 2 σ 2 σ 2 σ 2   = n 1 − [ x ] 1 + [ 1 ] x 1 − [ x ] n + [ 1 ] x n [ 1 ][ x 2 ] − [ x ][ x ] · · · σ 2 σ 2 σ 2 σ 2 n 1 And finally: θ 0 = [ x 2 ][ y ] − [ x ][ xy ] θ 1 = − [ x ][ y ] + [ 1 ][ xy ] ˆ ˆ [ 1 ][ x 2 ] − [ x ][ x ] , [ 1 ][ x 2 ] − [ x ][ x ]

  67. Best Linear Unbiased Estimate (BLUE) Tools for physicists: Statistics | SoSe 2019 | 87 Have seen how to combine uncorrelated measurements. Now consider n data points y i , � y = ( y 1 , . . . , y n ) with covariance matrix V . Calculate weighted average λ by minimising χ 2 ( λ ) = ( � λ ) T V − 1 ( � y − � y − � � λ ) λ = ( λ , . . . , λ ) Result: w i = ∑ k ( V − 1 ) ik λ = ∑ ˆ w i y i , with ∑ k , l ( V − 1 ) kl i Variance: σ 2 w T V � w = ∑ w i V ij w j λ = � ˆ i , j This is the best linear unbiased estimator, i.e. the linar unbiased estimator with the lowest variance

  68. BLUE Special case: two correlated measurements | SoSe 2019 | Tools for physicists: Statistics 88 Consider two measurements y 1 , y 2 , with covariance matrix ( ρ is correlation coefficient) � � σ 2 ρσ 1 σ 2 1 V = σ 2 ρσ 1 σ 2 2 Applying formulas from above:   1 − ρ 1 σ 2 V − 1 = σ 1 σ 2   ; λ = wy 1 + ( 1 − w ) y 2 ˆ 1 − ρ 1 1 − ρ 2 σ 2 σ 1 σ 2 2 σ 2 ( 1 − ρ 2 ) σ 2 1 σ 2 2 − ρσ 1 σ 2 λ ] = σ 2 = 2 w = V [ ˆ ; σ 2 1 + σ 2 σ 2 1 + σ 2 2 − 2 ρσ 1 σ 2 2 − 2 ρσ 1 σ 2

  69. Weighted average of correlated measurements: interesting example Tools for physicists: Statistics | SoSe 2019 | 89 adapted from Cowan’s book and Scott Oser’s lecture: Measure length of an object with two rulers. Both are calibrated to be accurate at temperature T = T 0 , but otherwise have a temperature dependency: true length y is related to measured length L by y i = L i + c i ( T − T 0 ) Assume that we know c i and the (Gaussian) uncertainties. We measure L 1 , L 2 , and T , and want to combine the measurements to get the best estimate of the true length.

  70. Weighted average of correlated measurements: interesting example | SoSe 2019 | Tools for physicists: Statistics 90 Start by forming covariance matrix of the two measurements: σ 2 i = σ 2 L + c 2 i σ 2 y i = L i + c i ( T − T 0 ) ; T cov [ y 1 , y 2 ] = c 1 c 2 σ 2 T Use the following parameter values, just for concreteness: c 1 = 0 . 1 L 1 = 2 . 0 ± 0 . 1 y 1 = 1 . 80 ± 0 . 22 T 0 = 25 c 2 = 0 . 2 L 2 = 2 . 3 ± 0 . 1 y 2 = 1 . 90 ± 0 . 41 T = 23 ± 2 With the formulas above, we obtain the following weighted average y = 1 . 75 ± 0 . 19 Why doesn’t y lie between y 1 and y 2 ? Weird!

  71. Weighted average of correlated measurements: interesting example | SoSe 2019 | Tools for physicists: Statistics 91 y 1 and y 2 were calculated assuming T = 23 Fit adjusts temperature and finds best agreement at ˆ T = 22 Temperature is a nuisance parameter in this case Here, data themselves provide information about nuisance parameter

  72. Tools for physicists: Statistics | SoSe 2019 | 92 Confidence intervals

  73. 179 4 GeV c 2 is 68 The probability of M top being in the range 169 2 WRONG: M top is what it is, it is either in or outside this range. P is 0 or 1. M top has been measured to be 174 3 GeV c 2 using a technique which has probability of being within 5 1 GeV c 2 of the true result a 68 RIGHT | SoSe 2019 | Tools for physicists: Statistics 93 / c 2 In 2006: M top = 174 . 3 ± 5 . 1 GeV What does this mean? / c 2 68 % of top quarks have masses between 169 . 2 and 179 . 4 GeV WRONG: all top quarks have same mass!

  74. M top has been measured to be 174 3 GeV c 2 using a technique which has probability of being within 5 1 GeV c 2 of the true result a 68 RIGHT | SoSe 2019 | Tools for physicists: Statistics 93 / c 2 In 2006: M top = 174 . 3 ± 5 . 1 GeV What does this mean? / c 2 68 % of top quarks have masses between 169 . 2 and 179 . 4 GeV WRONG: all top quarks have same mass! / c 2 is 68 % The probability of M top being in the range 169 . 2 − 179 . 4 GeV WRONG: M top is what it is, it is either in or outside this range. P is 0 or 1.

  75. 93 | SoSe 2019 | Tools for physicists: Statistics / c 2 In 2006: M top = 174 . 3 ± 5 . 1 GeV What does this mean? / c 2 68 % of top quarks have masses between 169 . 2 and 179 . 4 GeV WRONG: all top quarks have same mass! / c 2 is 68 % The probability of M top being in the range 169 . 2 − 179 . 4 GeV WRONG: M top is what it is, it is either in or outside this range. P is 0 or 1. / c 2 using a technique which has M top has been measured to be 174 . 3 GeV / c 2 of the true result a 68 % probability of being within 5 . 1 GeV RIGHT

  76. 93 | SoSe 2019 | Tools for physicists: Statistics / c 2 In 2006: M top = 174 . 3 ± 5 . 1 GeV What does this mean? / c 2 68 % of top quarks have masses between 169 . 2 and 179 . 4 GeV WRONG: all top quarks have same mass! / c 2 is 68 % The probability of M top being in the range 169 . 2 − 179 . 4 GeV WRONG: M top is what it is, it is either in or outside this range. P is 0 or 1. / c 2 using a technique which has M top has been measured to be 174 . 3 GeV / c 2 of the true result a 68 % probability of being within 5 . 1 GeV RIGHT if we repeated the measurement many times, we would obtain many different intervals; they would bracket the true M top in 68 % of all cases

  77. Point estimates, limits Tools for physicists: Statistics | SoSe 2019 | 94 Often reported: point estimate and its standard deviation, ˆ θ . θ ± ˆ σ ˆ In some situations, an interval is reported instead, e.g. when p.d.f. of the estimator is non-Gaussian, or there are physical boundaries on the possible values of the parameter Goals: communicate as objectively as possible the result of the experiment provide an interval that is constructed to cover the true value of the parameter with a specified probability provide information needed to draw conclusions about the parameter or to make a particular decision draw conclusions about parameter that incorporate stated prior beliefs With sufficiently large data sample, point estimate and standard deviation essentially satisfy all these goals.

Recommend


More recommend