Bayesian Estimation & Information Theory Jonathan Pillow Mathematical Tools for Neuroscience (NEU 314) Spring, 2016 lecture 18
Bayesian Estimation three basic ingredients: 1. Likelihood jointly determine the posterior 2. Prior L (ˆ “cost” of making an estimate θ , θ ) 3. Loss function if the true value is • fully specifies how to generate an estimate from the data Bayesian estimator is defined as: Z ˆ L (ˆ θ ( m ) = arg min θ , θ ) p ( θ | m ) d θ “Bayes’ risk” ˆ θ
Typical Loss functions and Bayesian estimators L (ˆ θ , θ ) = (ˆ θ − θ ) 2 1. squared error loss 0 need to find minimizing the expected loss: Differentiate with respect to and set to zero: “posterior mean” also known as Bayes’ Least Squares (BLS) estimator
Typical Loss functions and Bayesian estimators L (ˆ θ , θ ) = 1 − δ (ˆ “zero-one” loss θ − θ ) 2. (1 unless ) 0 expected loss: which is minimized by: • posterior maximum (or “mode”). • known as maximum a posteriori (MAP) estimate.
MAP vs. Posterior Mean estimate: 0.3 0.2 0.1 gamma pdf 0 0 2 4 6 8 10 Note: posterior maximum and mean not always the same!
Typical Loss functions and Bayesian estimators 3. “L1” loss 0 expected loss: HW problem: What is the Bayesian estimator for this loss function?
Simple Example: Gaussian noise & prior 1. Likelihood additive Gaussian noise zero-mean Gaussian 2. Prior 3. Loss function: doesn’t matter (all agree here) posterior distribution MAP estimate variance
Likelihood 8 m 0 - 8 - 8 0 8 θ
Likelihood 8 m 0 - 8 - 8 0 8 θ
Likelihood 8 m 0 - 8 - 8 0 8 θ
Likelihood 8 - 8 0 8 m 0 - 8 - 8 0 8 - 8 0 8 θ
Prior 8 m 0 - 8 - 8 0 8 θ
Computing the posterior likelihood prior posterior ∝ 0 x m θ 0 0 0
Making an Bayesian Estimate: likelihood prior posterior m* ∝ 0 x m θ 0 0 0 bias 0 0 0
High Measurement Noise: large bias likelihood prior posterior ∝ 0 x m θ 0 0 0 larger bias 0 0 0
Low Measurement Noise: small bias likelihood prior posterior ∝ 0 x m θ 0 0 0 small bias 0 0 0
Bayesian Estimation: • Likelihood and prior combine to form posterior • Bayesian estimate is always biased towards the prior (from the ML estimate)
Application #1: Biases in Motion Perception + Which grating moves faster?
Application #1: Biases in Motion Perception + Which grating moves faster?
Explanation from Weiss, Simoncelli & Adelson (2002): likelihood posterior prior prior likelihood 0 0 Noisier measurements, so likelihood is broader ⇒ posterior has larger shift toward 0 (prior = no motion) • In the limit of a zero-contrast grating, likelihood becomes infinitely broad ⇒ percept goes to zero-motion. • Claim: explains why people actually speed up when driving in fog!
summary • 3 ingredients for Bayesian estimation (prior, likelihood, loss) • Bayes’ least squares (BLS) estimator (posterior mean) • maximum a posteriori (MAP) estimator (posterior mode) • accounts for stimulus-quality dependent bias in motion perception (Weiss, Simoncelli & Adelson 2002)
Recommend
More recommend