Probability & Statistics Intro / Review NEU 560 Jonathan Pillow Lecture 6, part II 1
continuous probability distribution takes values in a continuous space, e.g., probability density function (pdf) : • • • 2
discrete probability distribution takes finite (or countably infinite) number of values, eg probability mass function (pmf): • • 3
some friendly neighborhood distributions ⇤ ⌅ Continuous � ( x − u ) 2 1 ⇥ P ( x ; µ, σ ) = exp √ Gaussian 2 σ 2 2 πσ 1 2 ( x − µ ) T Λ − 1 ( x − µ ) − 1 � ⇥ P ( x n ; µ, Λ ) = 2 exp multivariate Gaussian 1 2 | Λ | n (2 π ) P ( x ; a ) = ae � ax exponential Discrete coin flipping Bernoulli ⇤ n ⌅ p k (1 − p ) n − k P ( k ; n, p ) = binomial sum of n coin flips k P ( k ; λ ) = λ k k ! e − λ sum of n coin flips with P(heads)= λ /n, Poisson in limit n →∞ 4
joint density • positive • sums to 1 3 2 1 0 − 1 − 2 − 3 − 3 − 2 − 1 0 1 2 3 5
marginalization (“integration”) 3 2 1 0 − 1 − 2 − 3 − 3 − 2 − 1 0 1 2 3 6
marginalization (“integration”) 3 2 1 0 − 1 − 2 − 3 − 3 − 2 − 1 0 1 2 3 7
conditionalization (“slicing”) 3 2 (“joint divided by marginal”) 1 0 − 1 − 2 − 3 -3 -2 -1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 8
conditionalization (“slicing”) 3 2 (“joint divided by marginal”) 1 0 − 1 − 2 − 3 -3 -2 -1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 9
conditionalization (“slicing”) conditional 3 2 1 0 marginal − 1 P(y) − 2 − 3 -3 -2 -1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 10
conditional densities 3 2 1 0 − 1 − 2 − 3 -3 -2 -1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 11
conditional densities 3 2 1 0 − 1 − 2 − 3 -3 -2 -1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 12
Bayes’ Rule Conditional Densities likelihood prior Bayes’ Rule marginal probability of y posterior (“normalizer”) 13
Terminology question: • When do we call this a likelihood ? A : when considered as a function of x (i.e., with y held fixed) • note: doesn’t integrate to 1 . • What’s it called as a function of y, for fixed x? conditional distribution or sampling distribution 14
Expectations (“averages”) Expectation is the weighted average of a function (of a random variable) according to the distribution (of that random variable) discrete con'nuous or pmf pdf Corresponds to taking weighted average of f(X), weighted by how probable they are under P(x). 15
Expectations (“averages”) Expectation is the weighted average of a function (of a random variable) according to the distribution (of that random variable) discrete con'nuous or pmf pdf Monte Carlo evaluation of an expectation: x ( i ) ∼ P ( x ) 1. draw samples from distribu'on: for i = 1 to N N X E [ f ( x )] ≈ 1 f ( x ( i ) ) 2. average N i =1 16
Expectations (“averages”) Expectation is the weighted average of a function (of a random variable) according to the distribution (of that random variable) discrete con'nuous or pmf pdf It’s really just a dot product! Thus, expectation is a linear function: 17
Expectations (“averages”) The two most important expectations (also known as “moments”): • Mean: E[x] (average value of RV) • Variance: E[(x - E[x]) 2 ] (average squared dist between X and its mean). Note: expectations don’t always exist! e.g. Cauchy: has no mean! 18
independence 3 2 1 0 − 1 − 2 − 3 − 3 − 2 − 1 0 1 2 3 19
independence Definition: x , y are independent iff 3 2 1 0 − 1 − 2 − 3 − 3 − 2 − 1 0 1 2 3 20
independence Definition: x , y are independent iff 3 2 In linear algebra terms: 1 0 (outer product) − 1 − 2 − 3 − 3 − 2 − 1 0 1 2 3 21
independence Definition: x , y are independent iff 3 2 Alternative definition: 1 0 − 1 All conditionals are the same! − 2 − 3 − 3 − 2 − 1 0 1 2 3 -3 -2 -1 0 1 2 3 22
independence Definition: x , y are independent iff 3 2 Alternative definition: 1 0 − 1 All conditionals are the same! − 2 − 3 − 3 − 2 − 1 0 1 2 3 -3 -2 -1 0 1 2 3 23
Correlation vs. Dependence positive correlation negative correlation 1. Correlation − 3 − 2 − 1 0 1 2 3 3 3 2 2 1 1 0 0 − 1 1 − 2 2 − 3 3 − 3 − 2 − 1 0 1 2 3 Mean of y|x changes systematically with x 24
Correlation vs. Dependence positive correlation negative correlation 1. Correlation − 3 − 2 − 1 0 1 2 3 3 3 2 2 1 1 0 0 − 1 1 − 2 2 − 3 3 − 3 − 2 − 1 0 1 2 3 Mean of y|x changes systematically with x 2. Dependence • arises whenever • quantified by mutual information: KL divergence • MI=0 ⇒ independence 25
Correlation vs. Dependence Q : Can you draw a distribution that is uncorrelated but dependent? 26
Correlation vs. Dependence Q : Can you draw a distribution that is uncorrelated but dependent? P(filter 2 output | filter 1 output) filter 2 output “Bowtie” dependencies Flower image: [Schwartz & in natural scenes: Simoncelli 2001] (uncorrelated but dependent) filter 1 output 27
Is this distribution independent? 3 2 1 0 − 1 − 2 − 3 − 3 − 2 − 1 0 1 2 3 28
Is this distribution independent? 3 2 1 0 − 1 − 2 − 3 − 3 − 2 − 1 0 1 2 3 29
Is this distribution independent? 3 2 1 0 − 1 − 2 − 3 − 3 − 2 − 1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 No! Conditionals over y are different for different x! 30
FUN FACT: Independent Gaussian is the only distribution that is both: • independent (equal to the product of its marginals) • spherically symmetric: orthogonal matrix Corollary: circular scatter / contour plot not su ffi cient to show independence! 31
Summary • continuous & discrete distributions • marginalization (splatting) • conditionalization (slicing) • Bayes’ rule (prior, likelihood, posterior) • Expectations • Independence & Correlation 32
Recommend
More recommend