probabilities and statistics
play

Probabilities and Statistics An introduction to concepts and - PowerPoint PPT Presentation

Probabilities and Statistics An introduction to concepts and terminology Christoph Rosemann DESY 18. March 2013 CR (DESY) Probabilities and Statistics 18. March 2013 1 / 62 Outline Probability Distributions and their characterization


  1. Probabilities and Statistics An introduction to concepts and terminology Christoph Rosemann DESY 18. March 2013 CR (DESY) Probabilities and Statistics 18. March 2013 1 / 62

  2. Outline Probability Distributions and their characterization Examples for specific distributions Central limit theorem From univariate to multivariate Parameter transformation Error propagation CR (DESY) Probabilities and Statistics 18. March 2013 2 / 62

  3. Literature Short summaries: PDG articles on probability and statistics (by Glen Cowan) Introductory books: R.J. Barlow Statistics Glen Cowan Statistical Data Analysis More advanced: V. Blobel/E. Lohrmann Statistische und numerische Methoden der Datenanalyse (in German), now online, too: http://desy.de/~blobel/ebuch.html Wes Metzger (online book) http://www.hef.ru.nl/~wes/stat_course/statistics.html CR (DESY) Probabilities and Statistics 18. March 2013 3 / 62

  4. Probability Fundamental motivation In physics it’s impossible to determine (or predict) the outcome of an experiment perfectly, regardless if its classical (deterministic) or quantum physics. The outcome of an experiment is governed more or less by chance! Inversion of problem Determine some physically meaningful quantity including its uncertainty from an experiment with limited accuracy. Goal of the(se) lecture(s): to teach the mathematical methods to analyze data including their uncertainties. CR (DESY) Probabilities and Statistics 18. March 2013 4 / 62

  5. An example from quantum mechanics Take a radioactive substance with decay constant λ . Quantum mechanics yield description of decay probability d p of a nucleus within an interval d τ : d p = λ d τ For a given number n of these nuclei, the mean number of decays in an interval d τ is: − dn ( τ ) = n ( τ ) − n ( τ + d τ ) = n ( τ ) λ d τ dn ( τ ) ⇒ n ( τ ) = − λ d τ = Integration from τ = 0 to a time t yields the known decay law: n ( t ) = n 0 e − λ t CR (DESY) Probabilities and Statistics 18. March 2013 5 / 62

  6. Remarks on the decay example λ is a probability (per time) The decay law describes only the average number of decays Actual number of decays is not constant, there are statistical variations How do I determine the decay constant λ (or lifetime τ = 1 /λ ) from this fluctuating outcome? The fundamental problem How do I determine the underlying constant from statistically varying measurements? CR (DESY) Probabilities and Statistics 18. March 2013 6 / 62

  7. Statistical and systematic errors Mathematical theory exists for statistical uncertainties Single measurements have possibly large uncertainties – determine the size Averaging repeated measurements reduces uncertainty – but that’s not always possible How long/often one has to measure to reach a given accuracy Different story: systematic errors E.g. limited accuracy of detector components, limited knowledge No mathematical theory! Usually have to be determined by the experimentalist Once determined, they usually can be incorporated into result like statistical errors = ⇒ Special lecture CR (DESY) Probabilities and Statistics 18. March 2013 7 / 62

  8. Definitions of probability Classical (Laplace) – Principle of symmetry # positive cases p = # all possible cases Frequentistic The relative frequency determines the probability. In the limit of von Mises this the extension of the classical interpretation: k p = lim n , n →∞ where k counts the number of favorable outcomes in n tries. CR (DESY) Probabilities and Statistics 18. March 2013 8 / 62

  9. Further definitions of probability Subjective (or Bayesian ) Probability is a measure of the degree of belief. Usually this is prior to doing an experiment. Axiomatic ( Kolmogorov ) Let S be a sample space of events and A, B sub samples. 1 p ( A ) ∈ [0 , 1] 2 p ( A ∪ B ) = p ( A ) + p ( B ), if A ∩ B = ∅ 3 p ( S ) = 1 CR (DESY) Probabilities and Statistics 18. March 2013 9 / 62

  10. Frequentist vs. Subjectivist Often calculations and results are the same Frequentist approach sometimes impossible or useless ◮ e.g. the underlying principle of betting ◮ e.g. searches for new particles ◮ harsh requirement: experiment should be repeatable arbitrarily often under the same circumstances Still insurance companies are very good at predicting the future from past experience Topic is far from trivial Usually we’ll take a frequentists view CR (DESY) Probabilities and Statistics 18. March 2013 10 / 62

  11. Conditional probabilities The conditional probability p ( A | B ) is the probability that an event A occurs under the assumption that event B also occurs, defined by p ( A | B ) = p ( A ∩ B ) . p ( B ) With p ( A ∩ B ) = p ( A | B ) p ( B ) = p ( B | A ) p ( A ) follows Bayes’ Theorem p ( A | B ) = p ( B | A ) p ( A ) p ( B ) (absolutely NO relation to Bayesian view of probabilities) CR (DESY) Probabilities and Statistics 18. March 2013 11 / 62

  12. Statistical independence A and B are statistically independent if the occurrence of B is independent of the occurrence of A: p ( B | A ) = p ( B ) p ( A | B ) = p ( A ) and In this case the probabilities can be multiplied to yield the joint probability p ( A ∩ B ) = p ( A ) p ( B ) CR (DESY) Probabilities and Statistics 18. March 2013 12 / 62

  13. Example for statistical independence Take the following events picked at random from a long time period: A : it’s Saturday p ( A ) = 1 / 7 B : it’s a day with snowfall (in Hamburg) p ( B ) ≈ 1 / 40 For a random day these are independent, thus the probability to pick a Saturday on which it snowed is p ( A ∩ B ) = p ( A | B ) p ( A ) = p ( A ) p ( B ) ≈ 1 / 280 Now change change A : it’s January, p ( A ) ≈ 1 / 12 These events are not independent – the probability of snowfall depends heavily on the season! CR (DESY) Probabilities and Statistics 18. March 2013 13 / 62

  14. Section 2 Description of probability: distributions and their characteristics CR (DESY) Probabilities and Statistics 18. March 2013 14 / 62

  15. Description of probability Most relevant: physics experiments whose outcome varies statistically View the outcome of an experiment as random variable Two major categories discrete random variables : enumerable values k ∈ N , k ∈ [ a , b ] continuous random variables : continuous values x ∈ R CR (DESY) Probabilities and Statistics 18. March 2013 15 / 62

  16. Discrete random variables Probability The probability to obtain a certain value r is P ( r ) Axioms demand b � P ( r ) = 1 r = a Example: count decay rates in a time interval CR (DESY) Probabilities and Statistics 18. March 2013 16 / 62

  17. Continuous random variables Probability is only defined for an interval The probability for a ≤ x < b is defined by the integral � b P ( a ≤ x < b ) = f ( x ) dx a f ( x ) is the probability density function or pdf Axioms demand � ∞ f ( x ) ≥ 0 , ∀ x ∈ def . range f ( x ) = 1 −∞ The cumulative probability F ( x ) is used to describe the probability to obtain a value x or smaller � x f ( x ′ ) dx ′ . F ( x ) = −∞ CR (DESY) Probabilities and Statistics 18. March 2013 17 / 62

  18. Different definitions of probabilities Discrete random variables can be directly assigned to probabilities; e.g. a single value within the boundaries has a certain probability. Continuous random variables have to be integrated over an interval; it doesn’t make sense to determine the probability for a single value! Fundamental is the probability density function (pdf): f ( x ). The axioms of probabilities demand that f ( x ) ≥ 0 and the sum/integral over all possible values is 1. CR (DESY) Probabilities and Statistics 18. March 2013 18 / 62

  19. Characterizations of probability distributions I The expectation value with respect to a function h ( x ) is defined as � ∞ E [ h ( x )] = h ( x ) · f ( x ) dx . −∞ If h ( x ) is a function like h ( x ) = x n (with n ∈ N ), then this expectation value is called the nth moment of the distribution � ∞ x n · f ( x ) dx . E [ h ( x )] = E [ x n ] = −∞ The most important is the first moment of the pdf, its mean value � ∞ � x � = E [ x ] = x · f ( x ) dx = µ. −∞ CR (DESY) Probabilities and Statistics 18. March 2013 19 / 62

  20. Characterizations of probability distributions II The variance The width of a distribution can be described as a moment with respect to its mean value. These are functions like h ( x ) = ( x − � x � ) n , called central moments . The most important is the second central moment, the variance � ∞ σ 2 = E [( x − � x � ) 2 ] = ( x − � x � ) 2 f ( x ) dx . −∞ Higher central moments are much less important, but can help sometimes Examples: are the skewness ( ∼ x 3 ) or the kurtosis ( ∼ x 4 ) CR (DESY) Probabilities and Statistics 18. March 2013 20 / 62

  21. Characterizations of probability distributions III Other characterizations are also widely in use: Describing the central value The median x m is the value, where the probabilities to obtain a larger or a smaller value are equal: F ( x m ) = 0 . 5. The most probable value is the absolute maximum of the pdf. Describing the width The FWHM (full width half maximum) is the positive difference of the two x-values, where the pdf has dropped to half of its maximum value. Describing the width w.r.t the mean The RMS (root mean square) is the square root of the second � � σ 2 + µ 2 . E [ x 2 ] = moment x rms = Note: RooT means standard deviation σ when using RMS CR (DESY) Probabilities and Statistics 18. March 2013 21 / 62

Recommend


More recommend