statistics bayesian inference lecture 1
play

Statistics & Bayesian Inference Lecture 1 Joe Zuntz Lecture 1 - PowerPoint PPT Presentation

Statistics & Bayesian Inference Lecture 1 Joe Zuntz Lecture 1 Essentials of probability Some analytic Motivations distributions Definitions Bayes Theorem Probability Models & Parameter Distributions Spaces


  1. Statistics & Bayesian Inference Lecture 1 Joe Zuntz

  2. Lecture 1 Essentials of probability • Some analytic • Motivations distributions • Definitions • Bayes Theorem • Probability • Models & Parameter Distributions Spaces • Basic probability • How scientists can use operations probability

  3. Motivations • Learn as much as possible from our (expensive) data H 0 = (72 ± 8) km s − 1 Mpc − 1 • Constrain parameters in models • Test & compare models • Characterize collections of numbers

  4. Probability Distributions: Definitions • Assign real number P ≥ 0 to each H H } 0.25 member of a sample space 
 (discrete or continuous, finite or infinite) • P=probability density function (PDF) or H T } 0.25 probability mass function (PMF) • This set represents possible outcomes T H } 0.25 of an experiment/game/event/situation • e.g. possible results tossing two coins, height of next person to walk through T T } 0.25 door

  5. Probability Distributions: Definitions • Assign real number P ≥ 0 to each member of a sample space 
 (discrete or continuous, finite or infinite) • P=probability density function (PDF) or probability mass function (PMF) • This set represents possible outcomes of an experiment/game/event/situation • e.g. possible results tossing two coins, height of next person to walk through door

  6. Probability Distributions: Definitions • A random variable X is any value subject to randomness, e.g.: • was first toss heads? 
 was the sequence Heads-Tails? 
 were both tosses the same? • Discrete X: P is a list of values • Continuous X: P is a function, PDF, (which we have to integrate to answer questions)

  7. Probability Distributions: Basic properties • Since X must have exactly one value: X P ( x ) = 1 • Discrete: x ∈ X • Continuous: Z P ( x )d x = 1 x ∈ X • P(X=x) = f(x) 
 Usually just write P(X) = f(x) • 0 ≤ P(x) ≤ 1

  8. Probability Distributions: Combining Probabilities • Joint probability 
 P(XY) 
 P(X=x and Y=y) 
 P(X ∩ Y) • Union 
 P(X=x or Y=y) 
 P(X ∪ Y)

  9. Probability Distributions: Combining Probabilities • Conditional 
 P(X=x given Y=y) 
 P(X|Y) • Independence: • P(X|Y) = P(X) • X independent of Y

  10. Probability Distributions: Identities • P(not X) = 1-P(X) • P(XY) = P(X|Y) P(Y) • P(XY) = P(X)+P(Y)-P(X ∩ Y)

  11. Probability Distributions: Expectations • The expectation (or mean) of a random variable X is given by: Z X E ( X ) = P ( X ) X E ( X ) = P ( X ) X d X � • Or a function of it by: Z X E ( f ( X )) = P ( X ) f ( X )d X E ( f ( X )) = P ( X ) f ( X )

  12. Probability Distributions: Expectations MODE • Expectations are one measure if centrality, and not always a good one. • Mode and median also exist • All just ways of reducing or characterizing a distribution MEAN

  13. Probability Distributions: Marginalizing • Discrete: X P ( x ) = P ( x | y i ) P ( y i ) � i • Continuous: Z P ( x ) = P ( x | y ) P ( y )d y � • If you don’t care about something, marginalize over it

  14. Probability Distributions: Changing variables u = f ( x ) • Probability mass P ( u )d u = P ( x )d x must be conserved, not density P ( u ) = P ( x )d x d u • Relate with a = P ( x ) / d u Jacobian d x = P ( x ) /f 0 ( x ) • Be especially careful in more dimensions

  15. Probability Distributions: Drawing samples • Generate values of X with probability specified by P(X) • Draw enough samples: histogram looks like PDF • See lecture 3

  16. Probability Distributions: Analytic examples • Wikipedia is brilliant for this • Uniform • Delta function • Gaussian (normal) • Exponential • Poisson 1 P ( x ) = b − a, x ∈ [ a, b ]

  17. Probability Distributions: Analytic examples • Wikipedia is brilliant for this • Uniform • Delta function • Gaussian (normal) • Exponential • Poisson P ( x ) = δ ( x − x 0 )

  18. Probability Distributions: Analytic examples • Wikipedia is brilliant for this • Uniform • Delta function • Gaussian (normal) • Exponential • Poisson 2 πσ 2 exp − ( x − µ ) 2 1 P ( x ) = √ 2 σ 2

  19. Probability Distributions: Analytic examples • Wikipedia is brilliant for this • Uniform • Delta function • Gaussian (normal) • Exponential • Poisson P ( x ) = λ e − λ x , x > 0

  20. Probability Distributions: Analytic examples • Wikipedia is brilliant for this • Uniform • Delta function • Gaussian (normal) • Exponential • Poisson P ( n ) = λ n e − λ n !

  21. Bayes Theorem 
 and Inference P ( AB ) = P ( A | B ) P ( B ) = P ( B | A ) P ( A )

  22. Bayes Theorem 
 and Inference P ( AB ) = P ( A | B ) P ( B ) = P ( B | A ) P ( A ) ∴ P ( A | B ) = P ( B | A ) P ( A ) P ( B )

  23. Bayes Theorem 
 and Inference P ( p | dM ) = P ( d | pM ) P ( p | M ) P ( d | M ) Prior Likelihood ∝ P ( d | pM ) P ( p | M ) Model Observed data Parameters

  24. 
 Bayes Theorem 
 and Inference What you know after looking at the data = 
 what you knew before 
 + what the data told you

  25. Models & Parameters • A model is the mathematical theory that describes how your data arose. • It is not a theory of how what you wanted to measure arose. • Non-trivial models include some deterministic and some stochastic parts. • Noise is one stochastic; many (most?) astrophysical models also have others too

  26. Models & Parameters • Parameters are any unknown numerical values in your model • A parameter can have probability distributions • You need (and have) some prior (background) information about all your parameters • This may be subjective!

  27. Parameter Spaces • Can use continuous parameters as dimensions in an abstract space c • Probabilities become functions of many variables: 
 P(uvwxyz) m • As the dimension of this space increases your intuition becomes worse

  28. Descriptive Statistics • Reduce samples or distribution to set of characteristic numbers • In a analytic cases this is all you need to describe a distribution • Statistics of samples 
 = estimators/approximations to underlying distribution stats

  29. Descriptive Statistics: Mean Z E [ X ] = XP ( X )d X • Distribution mean P X i • Sample mean ¯ X = N

  30. Descriptive Statistics: Mean • Means can be 
 misleading! • Most distributions are asymmetric

  31. Descriptive Statistics: Variance Var( X ) = E [( X − ¯ X ) 2 ] • Distribution variance Z ( X − ¯ X ) 2 P ( X )d X = � • Sample variance P ( X i − ¯ X ) 2 σ 2 X = N � P ( X i − ¯ • Population variance X ) 2 s 2 X = N − 1

  32. Descriptive Statistics: Covariance Cov( X, Y ) = E [( X − ¯ X )( Y − ¯ Y )] Z ( X − ¯ X )( Y − ¯ = Y ) P ( XY )d X d Y • Covariance P ( X i − ¯ X )( Y i − ¯ Y ) σ XY = N

  33. Descriptive Statistics: Covariance σ XY > 0 σ XY < 0 Y Y X X

  34. Gaussians: 
 The Basics • One dimensional − ( x − µ ) 2 1  � P ( x ; µ, σ ) = 2 πσ exp continuous PDF √ 2 σ 2 • Two parameters: 
 Mean μ 
 Standard deviation σ • Symmetric • Common! But often an over-simplification.

  35. Gaussians: 
 Sigma numbers • Distance from mean defined in number of standard deviations sigma • Probability mass: 68% • 68% within 1 σ 95% • 95% within 2 σ 99.7% • 99.7% within 3 σ

  36. Gaussians: 
 Properties • Error function is cumulative integral of Gaussian • Sigma numbers can be read off

  37. Gaussians: 
 Properties • Sum of Gaussians has simple form: � X ∼ N ( µ x , σ 2 x ) Y ∼ N ( µ y , σ 2 y ) � ⇒ X + Y ∼ N ( µ x + µ y , σ 2 x + σ 2 = y ) � • Especially useful for sum of identical Gaussians, and leads to formula that error on the mean ~ n 1/2

  38. Gaussians: 
 Properties • Central limit theorem: 
 Given a collection of random variables X i : n 1 X � ( X i − µ i ) → N (0 , 1) s n i =1 � n X s 2 σ 2 n = i � i =1 • Provided that: 1 X ( X − µ i ) 2 ⇤ ⇥ → 0 E s 2 n

  39. Gaussians: 
 Properties • Central limit theorem: Single 
 Mean of 2 distribution Mean of 3 Mean of 4

  40. Gaussians: 
 Multivariate  � 1 − 1 2( x − µ ) T C − 1 ( x − µ ) P ( x ; µ , C ) = 2 | C | exp n (2 π ) • C is the covariance matrix - describes correlations between quantities • For example: data points often have correlated errors

  41. Interpretations of Probability Frequentists Bayesians Use probabilities to … describe frequencies quantify information Think model random variables with fixed unknowns parameters are … probabilities a repeatable random observed and therefore Think data is … variable fixed Call their work … “Statistics" “Inference" Make statements 
 intervals covering the truth constraints on model about … x% of the time parameters many approaches with 
 one approach with 
 Have … lots of implicit choices explicit choices

  42. Why Bayesian probability for science? • Answers the right question • We want facts about the world, not about hypothetical ensembles of experiments • The ideal process is always clear • Practical implementations more difficult • Problems and questions are more explicit

Recommend


More recommend