Practical data analysis References Variability Probability Distributions Practical data analysis Large Number Theorems Width of a distribution Doru Constantin and Guillaume Tresset Sampling Chi-squared doru.constantin@u-psud.fr distribution guillaume.tresset@u-psud.fr Errors Laboratoire de Physique des Solides, Orsay.
References I Practical data analysis ◮ Barlow, R. J. (1993). References Statistics: A Guide to the Use of Statistical Methods in the Physical Variability Sciences . Probability Chichester, England; New York: Wiley. Distributions Large Number ◮ Bevington, P. R. (1969). Theorems Data Reduction and Error Analysis for the Physical Sciences . Width of a distribution New York: McGraw-Hill. Sampling ◮ Bevington, P. R. and K. Robinson (2003). Chi-squared Data Reduction and Error Analysis for the Physical Sciences (3 ed.). distribution New York: McGraw-Hill. Errors ◮ Bohm, G. and G. Zech (2010). Introduction to Statistics and Data Analysis for Physicists . Hamburg: Verlag Deutsches Elektronen-Synchrotron. Freely available online from http://www-library.desy.de/preparch/books/ vstatmp_engl.pdf
References II Practical data analysis ◮ Drosg, M. (2009). References Dealing with Uncertainties (2 ed.). Variability Springer. Probability ◮ Feller, W. (1968). Distributions An Introduction to Probability Theory and Its Applications (3rd edition Large Number Theorems ed.). Width of a New York: Wiley. distribution Sampling ◮ Grinstead, C. M. and J. L. Snell (1997). Chi-squared Introduction to Probability (2 ed.). distribution American Mathematical Society. Errors Freely available online from http://www.dartmouth.edu/~chance/ ◮ Hughes, I. G. and T. P. A. Hase (2010). Measurements and their Uncertainties . Oxford: Oxford University Press. Short and very legible introduction.
References III Practical data analysis References Variability ◮ Jaynes, E. T. (2003). Probability Distributions Probability Theory – The Logic of Science . Large Number Cambridge: Cambridge University Press. Theorems Width of a ◮ Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery distribution (1992). Sampling Numerical Recipes in C: The Art of Scientific Computing (2 ed.). Chi-squared distribution Cambridge: Cambridge University Press. Errors ◮ Taylor, J. R. (1997). An Introduction to Error Analysis (2 ed.). Sausalito: University Science Books.
Variability Practical data analysis References Variability Probability Distributions Large Number Theorems 1. When measuring the height of all adult males in a Width of a certain town, one finds 177 ± 5 cm. distribution Sampling 2. The charge of the electron is (1 . 602176565 ± 0 . 000000035) × 10 − 19 C. Chi-squared distribution Errors
The meaning of probability Practical data analysis References Variability Probability Distributions Casting a die: Large Number Theorems 1. Out of a large number of trials, each face will come Width of a distribution on top about 1 in 6 times. Sampling 2. Our state of knowledge gives us no reason to prefer Chi-squared one of the faces over the others. distribution Errors Each face has a 1 / 6 probability of coming up.
Random variables Practical data analysis ◮ A random variable “is simply an expression whose value is the outcome of a particular experiment” References Variability (Grinstead & Snell, 1997). It takes values in a certain Probability domain Ω . Distributions ◮ This domain (or sample space ) can be discrete, Large Number Ω = { ω 1 , ω 2 , . . . ω k , . . . } ⊂ Z n (finite or countably Theorems infinite) or continuous Ω ⊂ R n Width of a distribution ◮ The elements of the sample space ( ω k or x ∈ R n ) are Sampling called outcomes . Subsets of Ω are called events . Chi-squared distribution ◮ We introduce a probability distribution, Errors characterized by a distribution function m . In the discrete case, this function satisfies: m ( ω ) ≥ 0 , ∀ ω ∈ Ω � ω ∈ Ω m ( ω ) = 1 The probability of an event E is defined as : P ( E ) = � ω ∈ E m ( ω ).
Continuous distributions Practical data analysis Let X be a continuous real-valued random variable. A References density function for X is a function f : Ω → R such that Variability Probability � b Distributions P ( a ≤ X ≤ b ) = f ( x )d x , ∀ a , b ∈ R . Large Number a Theorems Width of a � distribution P ( X ∈ E ) = f ( x )d x . ∀ E ⊂ R Sampling E Chi-squared P ([ x , x + d x ]) = f ( x )d x distribution Errors f ( x )d x is the probability of the outcome x The cumulative distribution function of X is: � x d F ( x ) = P ( X ≤ x ) = f ( t )d t , with d xF ( x ) = f ( x ) −∞
Central tendency Practical data analysis References Variability Probability Distributions Large Number Theorems Width of a distribution Sampling Chi-squared distribution Errors Figure: Log-normal distribution with parameters µ = 0 and σ = 0 . 25 (solid line) and σ = 1 (dashed line). The mean (blue), median (green) and mode (red) are shown for both curves.
Spread Practical data analysis References IQR Variability Q1 Q3 Q1 − 1.5 × IQR Q3 + 1.5 × IQR Probability Distributions Median Large Number −4 σ −3 σ −2 σ −1 σ 0 σ 1 σ 2 σ 3 σ 4 σ Theorems −2.698 σ −0.6745 σ 0.6745 σ 2.698 σ Width of a distribution Sampling Chi-squared 24.65% 50% 24.65% distribution −4 σ −3 σ −2 σ −1 σ 0 σ 1 σ 2 σ 3 σ 4 σ Errors 15.73% 68.27% 15.73% −4 σ −1 σ 1 σ 4 σ −3 σ −2 σ 0 σ 2 σ 3 σ Figure: Boxplot details
Higher-order moments Practical data analysis � 3 � � 4 � �� X − µ �� X − µ References skewness ; γ 2 = − 3 γ 1 = kurtosis Variability σ σ Probability Distributions Large Number Theorems Width of a distribution Sampling Chi-squared distribution Errors Graphics by MarkSweep. Licensed under Public domain via Wikimedia Commons
Uniform Practical data analysis References ◮ All outcomes have equal Variability probability Probability f ( x ) Distributions 1 for x ∈ [ a , b ] 1 Large Number b − a ◮ U ( x ; a , b ) = b − a Theorems 0 otherwise Width of a distribution ◮ µ = 1 2 ( a + b ) , m = 1 2 ( a + b ) x 0 a b Sampling M = any value in [ a , b ] . Chi-squared 1 distribution ◮ σ 2 = 1 F ( x ) 12 ( b − a ) 2 , γ 1 = 0 , γ 2 = − 6 / 5 Errors ◮ One cannot have a uniform distribution over an infinite domain (discrete or continuous)! a x 0 b Graphics by IkamusumeFan. Licensed under CCA-SA 3.0 via Wikimedia Commons _p / / 4 / / 4
Binomial Practical data analysis ◮ Number k of successes in a References p=0.5 and n=20 sequence of n independent p=0.7 and n=20 Variability p=0.5 and n=40 yes / no experiments (Bernoulli Probability trials), each of which yields Distributions success with probability p . Large Number Theorems ◮ B ( k ; n , p ) = C k n p k (1 − p ) n − k ; 0 10 20 30 40 Width of a k ∈ { 0 , 1 , . . . , n } distribution ◮ µ = np , m = � np � or � np � Sampling M = � ( n + 1) p � or � ( n + 1) p � − 1 . Chi-squared distribution Errors ◮ σ 2 = np (1 − p ) , γ 1 = 1 − 2 p γ 2 = 1 − 6 p (1 − p ) √ np (1 − p ) , np (1 − p ) ◮ k is the variable , n and p are parameters . Graphics by Tayste. Licensed under Public domain via Wikimedia Commons / /
Normal Practical data analysis References Variability 1.0 Probability ◮ Very widely encountered. μ = σ = 0, 2 0.2, μ = σ = 0, 2 1.0, 0.8 μ = σ = 2 0, 5.0, Distributions μ = σ = − 2, 2 0.5, 0.6 2 π e − ( x − µ )2 2 σ 2 ; Large Number 1 ◮ N ( x ; µ, σ ) = x ∈ R √ 0.4 Theorems σ 0.2 Width of a ◮ � X � = m = M = µ 0.0 distribution − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 x � X 2 � = σ 2 , γ 1 = 0 , γ 2 = 0 1.0 Sampling μ = σ = 2 0, 0.2, μ = σ = 0, 2 1.0, 0.8 μ = 0, σ = 2 5.0, Chi-squared μ = σ = − 2, 2 0.5, distribution 0.6 0.4 Errors 0.2 0.0 − 5 − 4 − 3 − 2 − 1 0 1 2 3 4 5 x Graphics by Inductiveload. Licensed under Public domain via Wikimedia Commons i / / 4 i / / 4
Poisson Practical data analysis ◮ Probability of a given number of independent events k occurring References in a fixed interval with a known Variability average rate. Probability Distributions ◮ P ( k ; λ ) = λ k k ! e − λ ; k ∈ N , λ ∈ R + Large Number Theorems ◮ µ = λ, m ≃ ⌊ λ + 1 / 3 − 0 . 02 /λ ⌋ Width of a M = ⌈ λ ⌉ − 1 , ⌊ λ ⌋ distribution Sampling ◮ σ 2 = λ, γ 1 = λ − 1 / 2 , γ 2 = λ − 1 Chi-squared ◮ Can be seen as the limit of a distribution Errors binomial distribution for large n : P ( k ; λ = np ) ≃ B ( k ; n , p ) ◮ Approaches N for large λ : P ( k ; λ ) ≃ N ( x = k ; µ = λ, σ 2 = λ ) Graphics by Skbkekas. Licensed under CCA 3.0 via Wikimedia Commons
Lorentzian Practical data analysis References Variability ◮ Shape of resonance peaks. Also Probability named after Cauchy (in Distributions mathematics) and Breit and Large Number Theorems Wigner (in spectroscopy) Width of a 1 distribution ◮ L ( x ; x 0 , γ ) = � 2 � ; � � x − x 0 1 + πγ Sampling γ x ∈ R , x 0 ∈ R , γ ∈ R + Chi-squared distribution ◮ m = M = x 0 Errors ◮ No µ or higher moments! Graphics by Skbkekas. Licensed under CCA 3.0 via Wikimedia Commons
Recommend
More recommend