cse 312
play

CSE 312 Autumn 2012 More on parameter estimation Bias; and - PowerPoint PPT Presentation

CSE 312 Autumn 2012 More on parameter estimation Bias; and Confidence Intervals 53 Bias 54 Recall Likelihood Function P( HHTHH | ): Probability of HHTHH, 0.08 given P(H) = : max 0.06 4 (1- ) P( HHTHH | Theta) 0.2


  1. CSE 312 Autumn 2012 More on parameter estimation – Bias; and Confidence Intervals 53

  2. Bias 54

  3. Recall Likelihood Function P( HHTHH | θ ): Probability of HHTHH, 0.08 given P(H) = θ : max 0.06 θ θ 4 (1- θ ) P( HHTHH | Theta) 0.2 0.0013 0.04 0.5 0.0313 0.02 0.8 0.0819 0.00 0.95 0.0407 0.0 0.2 0.4 0.6 0.8 1.0 Theta

  4. Recall Example 1 n coin flips, x 1 , x 2 , ..., x n ; n 0 tails, n 1 heads, n 0 + n 1 = n ; θ = probability of heads dL/d θ = 0 Observed fraction of successes in sample is MLE of success probability in population (Also verify it’s max, not min, & not better on boundary) 56

  5. (un-) Bias A desirable property: An estimator Y of a parameter θ is an unbiased estimator if E[Y] = θ For coin ex. above, MLE is unbiased: Y = fraction of heads = ( Σ 1 ≤ i ≤ n X i )/n, (X i = indicator for heads in i th trial) so E[Y] = ( Σ 1 ≤ i ≤ n E[X i ])/n = n θ /n = θ by linearity of expectation 57

  6. Are all unbiased estimators equally good? No! E.g., “Ignore all but 1st flip; if it was H, let Y’ = 1; else Y’ = 0” Exercise: show this is unbiased Exercise: if observed data has at least one H and at least one T, what is the likelihood of the data given the model with θ = Y’ ? 58

  7. Recall Ex 3: x i ∼ N ( µ, σ 2 ) , µ, σ 2 both unknown Likelihood surface Sample mean is MLE of 3 0.8 2 0.6 1 population mean, again 0 θ 2 -0.4 -0.4 0.4 -0.2 -0.2 0 0 0.2 0.2 0.2 θ 1 0.4 In general, a problem like this results in 2 equations in 2 unknowns. Easy in this case, since θ 2 drops out of the ∂ / ∂θ 1 = 0 equation 59

  8. Recall Ex. 3, (cont.) 2 ln 2 πθ 2 − ( x i − θ 1 ) 2 − 1 ⌅ ln L ( x 1 , x 2 , . . . , x n | θ 1 , θ 2 ) = 2 θ 2 1 ≤ i ≤ n + ( x i − θ 1 ) 2 − 1 2 π ⌅ ∂ ∂θ 2 ln L ( x 1 , x 2 , . . . , x n | θ 1 , θ 2 ) = = 0 2 θ 2 2 2 πθ 2 2 1 ≤ i ≤ n �⇤ θ 1 ) 2 ⇥ ˆ 1 ≤ i ≤ n ( x i − ˆ s 2 = /n = ¯ θ 2 Sample variance is MLE of population variance 60

  9. Ex. 3, (cont.) Y = ( Σ 1 ≤ i ≤ n X i )/n is the sample mean then Bias? if E[Y] = ( Σ 1 ≤ i ≤ n E[X i ])/n = n μ /n = μ so the MLE is an unbiased estimator of population mean Similarly, ( Σ 1 ≤ i ≤ n (X i - μ ) 2 )/n is an unbiased estimator of σ 2 . Unfortunately, if μ is unknown, estimated from the same data, as above, is a consistent, but biased estimate of population variance. (An example of overfitting.) Unbiased estimate (B&T p467): Roughly, lim n →∞ = correct One Moral: MLE is a great idea, but not a magic bullet 61

  10. ˆ More on Bias of θ 2 Biased? Yes. Why? As an extreme, think about n = 1. ˆ Then θ 2 = 0; probably an underestimate! θ 2 ˆ Also, consider n = 2. Then θ 1 is exactly between the θ 1 two sample points, the position that exactly minimizes the expression for θ 2 . Any other choices for θ 1 , θ 2 make the likelihood of the observed data slightly lower . But it’s actually pretty unlikely that two sample points would be chosen exactly equidistant from, and on ˆ opposite sides of the mean, so the MLE θ 2 θ 2 systematically underestimates θ 2 . (But not by much, & bias shrinks with sample size.) 62

  11. Confidence Intervals 63

  12. A Problem With Point Estimates Think again about estimating the mean of a normal distribution. Sample X 1 , X 2 , …, X n Y n = ( Σ 1 ≤ i ≤ n X i )/n is an We showed sample mean unbiased (and consistent) estimator of the population mean. But with probability 1, it’s wrong! Can we say anything about how wrong? E.g., could I find a value Δ s.t. I’m 95% confident that the true mean is within ± Δ of my estimate? 64

  13. Y n = ( Σ 1 ≤ i ≤ n X i )/n is a random variable It has a mean and a variance Assuming X i ’s are i.i.d. normal, mean = μ , variance = σ 2 , Var(( Σ 1 ≤ i ≤ n X i )/n) = (1/n 2 ) Σ 1 ≤ i ≤ n Var(Y n ) = Var(X i ) = (1/n 2 )(n σ 2 ) = σ 2 /n So, Pr(( √ n)|Y n - μ |/ σ < z) = 2(1- Φ (z)) , (z >0) E.g., Pr(( √ n)|Y n - μ |/ σ < 1.96) ≈ 95% I.e., true μ within ±1.96 σ / √ n of estimate ~ 95% of time 65

Recommend


More recommend