CSE 312 Foundations of Computing II Lecture 24: Biased Estimation Stefano Tessaro tessaro@cs.washington.edu 1
Parameter Estimation – Workflow Parameter estimate Independent Distribution + % Algorithm samples # ' , … , # * ℙ(#|%) from ℙ(#|%) % = unknown parameter Maximum Likelihood Estimation (MLE). Given data # ' , … . , # * , find % = + + %(# ' , … , # * ) (“the MLE”) such that . # ' , … . , # * + % is maximized! 2
Likelihood – Continuous Case Definition. The likelihood of independent observations # ' , … . , # * is * . # ' , … . , # * % = / 2(# 0 |%) 01' 3
Example – Gaussian Parameters Normal outcomes # ' , … , # * , known variance 3 4 = 1 Goal: MLE for 6 = expectation * * 9 : ; < := > 1 * 9 : ; < := > 1 . # ' , … . , # * 6 = / 4 = / 4 28 28 01' 01' * # 0 − 6 4 ln . # ' , … . , # * 6 = − B ln 28 − C 2 2 01' 4
Goal: estimate 6 = expectation Example – Gaussian Parameters * # 0 − 6 4 ln . # ' , … . , # * 6 = − B ln 28 − C 2 2 01' ; < := > F ' = 4 ⋅ 2 ⋅ # 0 − 6 ⋅ −1 = 6 − # 0 Note: F= 4 * * D D6 ln . # ' , … . , # * 6 = C (# 0 − 6) = C # 0 − B6 = 0 01' 01' * # 0 6 = ∑ 0 In other words, MLE is the H population mean of the data. B 5
B samples # ' , … , # * ∈ ℝ from Gaussian P(6, 3 4 ) . Most likely 6 and 3 4 ? 0.5 0.4 0.3 0.2 0.1 0 −2 −1 0 1 3 −4 −3 2 4 6 5 6
Two-parameter optimization Normal outcomes # ' , … , # * Goal: estimate % ' = µ = expectation and % 4 = 3 4 = variance * * : ; < :R S > 1 . # ' , … . , # * % ' , % 4 = / 9 4R > 28% 4 01' ln . # ' , … . , # * % ' , % 4 = * # 0 − % ' 4 = −B ln 28 % 4 − C 2 2% 4 01' 7
Two-parameter estimation * 4 ln . # ' , … . , # * % ' , % 4 = − ln 28 % 4 # 0 − % ' − C 2 2% 4 01' We need to find a solution + % ' , + % 4 to D ln . # ' , … . , # * % ' , % 4 = 0 D% ' D ln . # ' , … . , # * % ' , % 4 = 0 D% 4 8
* # 0 − % ' 4 MLE for Expectation ln . # ' , … . , # * % ' , % 4 = −B ln 28 % 4 − C 2 2% 4 01' * D ln . # ' , … . , # * % ' , % 4 = 1 C (# 0 − % ' ) = 0 D% ' % 4 0 * # 0 In other words, MLE of expectation is % ' = ∑ 0 + (again) the population mean of the B data, regardless of % 4 9 What about the variance?
MLE for Variance * 4 # 0 − T % ' , % 4 = −B ln 28 % 4 % ' ln . # ' , … . , # * T − C 2 2% 4 01' * = −B ln 28 − B ln % 4 − 1 4 # 0 − T C % ' 2 2 2% 4 01' * D − B + 1 4 # 0 − + ln . # ' , … . , # * + = 0 % ' , % 4 = 4 C % ' D% 4 2% 4 2% 4 01' * % U = 1 In other words, MLE of variance is the 4 + # 0 − + B C % ' population variance of the data. 01' 10
So far • We have decided that MLE estimators are always good. • But why is it really the case? – Next: A natural property not always satisfied by MLE – And why MLE is nonetheless “good” 11
When is an estimator good? Parameter estimate Distribution samples Z ' , … , Z * Θ * Algorithm ℙ(#|%) from ℙ(#|%) % = unknown parameter Definition. An estimator is unbiased if for all B ≥ 1 , X Θ * = % . 12
* ] Recall: + Example – Coin Flips % = * Coin-flip outcomes # ' , … , # * , with B [ heads, B \ tails Fact. + % is unbiased Let ^ ' , … , ^ * be s.t. ^ 0 = 1 iff # 0 = _ (and 0 otherwise) In particular ℙ ^ 0 = 1 = % * * Θ = 1 Θ) = 1 X ^ 0 = 1 ` X(` B C ^ 0 B C B B ⋅ % = % 01' 01' 13
Notes • Unbiasedness is not the ultimate goal either – Consider estimator which sets + % = 1 if first coin toss is heads, and + % = 0 otherwise – regardless of number of samples. – ℙ a * = 1 = % – X a * = % • Generally, we would like instead ℙ a * ≈ % with high probability as B → ∞ . – Will discuss this on Monday. – Unbiasedness is a step towards this. 14
Example – Gaussian Normal outcomes Z ' , … , Z * iid according to P(6, 3 4 ) * Θ ' = ∑ 01' Z 0 ` B * Θ 4 = 1 4 ` Z 0 − ` B C Θ ' 01' 15
Example – Gaussian Normal outcomes Z ' , … , Z * iid according to P(6, 3 4 ) * Z 0 Θ ' = ∑ 0 ` B * = B ⋅ 6 Θ ' ) = ∑ 01' X(Z 0 ) Therefore: Unbiased! X(` = 6 B B 16
Example – Gaussian Assume: 3 4 > 0 Normal outcomes Z ' , … , Z * iid according to P(6, 3 4 ) * * Θ 4 = 1 1 4 4 ` Z 0 − ` ` Z 0 − ` B C Θ ' Θ 4 = B − 1 C Θ ' 01' 01' Unbiased! Example: B = 1 Θ 4 = 1 Θ ' = Z ' 1 Z ' − Z ' 4 = 0 X(` ` ` Θ 4 ) = 0 ≠ 3 4 1 = Z ' Next time: Unbiased estimator proof + more Therefore: Biased! intuition + confidence intervals 17
Example – Consistency Assume: 3 4 > 0 Normal outcomes Z ' , … , Z * iid according to P(6, 3 4 ) * * Θ 4 = 1 1 4 4 ` Z 0 − ` ` Z 0 − ` B C Θ ' Θ 4 = B − 1 C Θ ' 01' 01' Population variance – Biased! Sample variance – Unbiased! Left ` Θ 4 converges to same value as right ` Θ 4 , i.e., 3 4 , as B → ∞. Left ` Θ 4 is “consistent” 18
Consistent Estimators & MLE Parameter estimate Distribution samples Z ' , … , Z * Θ * Algorithm ℙ(#|%) from ℙ(#|%) % = unknown parameter Definition. An estimator is unbiased if X Θ * = % for all B ≥ 1 . Definition. An estimator is consistent if lim *→i X Θ j = % . (But not necessarily Theorem. MLE estimators are consistent. unbiased) 19
Recommend
More recommend