unbiased estimation binomial problem shows general
play

Unbiased Estimation Binomial problem shows general phenomenon. An - PDF document

Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of and bad for others. To compare and , two estimators of : Say is better than if it has uniformly smaller


  1. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆ θ and ˜ θ , two estimators of θ : Say θ is better than ˜ ˆ θ if it has uniformly smaller MSE: θ ( θ ) ≤ MSE ˜ θ ( θ ) MSE ˆ for all θ . Normally we also require that the inequality be strict for at least one θ . 135

  2. Question: is there a best estimate – one which is better than every other estimator? Answer: NO. Suppose ˆ θ were such a best es- timate. Fix a θ ∗ in Θ and let ˜ θ ≡ θ ∗ . θ is 0 when θ = θ ∗ . Since ˆ Then MSE of ˜ θ is better than ˜ θ we must have θ ( θ ∗ ) = 0 MSE ˆ θ = θ ∗ with probability equal to 1. so that ˆ So ˆ θ = ˜ θ . If there are actually two different possible val- ues of θ this gives a contradiction; so no such ˆ θ exists. 136

  3. Principle of Unbiasedness : A good estimate is unbiased, that is, E θ (ˆ θ ) ≡ θ . WARNING: In my view the Principle of Unbi- asedness is a load of hog wash. For an unbiased estimate the MSE is just the variance. An estimator ˆ Definition : φ of a parameter φ = φ ( θ ) is Uniformly Minimum Variance Unbiased (UMVU) if, whenever ˜ φ is an unbi- ased estimate of φ we have Var θ (ˆ φ ) ≤ Var θ (˜ φ ) We call ˆ φ the UMVUE. (‘E’ is for Estimator.) The point of having φ ( θ ) is to study problems like estimating µ when you have two parame- ters like µ and σ for example. 137

  4. Cram´ er Rao Inequality Suppose T ( X ) is some unbiased estimator of θ . We can derive some information from the identity E θ ( T ( X )) ≡ θ When we worked with the score function we derived some information from the identity � f ( x, θ ) dx ≡ 1 by differentiation and we do the same here. Since T ( X ) is is an unbiased estimate for θ then � E θ ( T ( X )) = T ( x ) f ( x, θ ) dx ≡ θ Differentiate both sides to get 1 = d � T ( x ) f ( x, θ ) dx dθ T ( x ) ∂ � = ∂θf ( x, θ ) dx T ( x ) ∂ � = ∂θ log( f ( x, θ )) f ( x, θ ) dx = E θ ( T ( X ) U ( θ )) where U is the score function. 138

  5. Remember: Cov( W, Z ) = E( WZ ) − E( W )E( Z ) A Here Cov θ ( T ( X ) , U ( θ )) = E( T ( X ) U ( θ )) − E( T ( X ))E( U ( θ )) But recall that score U ( θ ) has mean 0 so: Cov θ ( T ( X ) , U ( θ )) = 1 Definition of correlation gives: { Corr( W, Z ) } 2 = { Cov( W, Z ) } 2 Var( W )Var( Z ) Correlations squared are less than 1 therefore Var θ ( T )Var θ ( U ( θ )) > 1 . Remember Var( U ( θ )) = I ( θ ) Therefore 1 Var θ ( T ) ≥ I ( θ ) . RHS called the Cram´ er Rao Lower Bound. 139

  6. Examples of Cram´ er Rao Lower Bound: 1) X 1 , . . . , X n iid Exponential with mean µ . f ( x ) = 1 µ exp {− x/µ } for x > 0 Log-likelihood: � ℓ = − n log µ − X i /µ Score ( U ( µ ) = ∂ℓ/∂µ ): � X i U ( µ ) = − n µ + µ 2 Negative second derivative: � X i V ( µ ) = − U ′ ( µ ) = − n µ 2 + 2 µ 3 140

  7. Take expected value to compute Fisher infor- mation � E ( X i ) I ( µ ) = − n µ 2 + 2 µ 3 = − n µ 2 + 2 nµ µ 3 = n µ 2 So if T ( X 1 , . . . , X n ) is an unbiased estimator of µ then I ( µ ) = µ 2 1 Var( T ) ≥ n ¯ Example : X is an unbiased estimate of µ Also: X ) = µ 2 Var( ¯ n SO: ¯ X is the best unbiased estimate. It has the smallest possible variance. We say it is a Uniformly Minimum Variance Unbiased Estimator of µ . 141

  8. Similar ideas with more than 1 parameter. Example : X 1 , . . . , X n sample from N ( µ, σ 2 ). σ 2 ) is an unbiased estimator of Suppose (ˆ µ, ˆ ( µ, σ 2 ). (Such as ( ¯ X, s 2 ).) Estimating σ 2 not σ . So give symbol for σ 2 . Define τ = σ 2 . Find information matrix because CRLB is its inverse. Log-likelihood: � ( X i − µ ) 2 ℓ = − n − n 2 log τ − 2 log(2 π ) 2 τ Score:   � ( X i − µ ) τ   U =     � ( X i − µ ) 2  − n  2 τ + 2 τ 2 142

  9. Negative second derivative matrix:  � ( X i − µ )  n τ τ 2   V =     � ( X i − µ ) 2 � ( X i − µ )  − n  − 2 τ + τ 2 τ 3 Fisher information matrix � n � 0 τ I ( µ, τ ) = n 0 2 τ 2 Cram´ er-Rao lower bound: σ 2 ) ≥ { I ( µ, τ ) } − 1 Var(ˆ µ, ˆ � τ 0 � n = 2 τ 2 0 n In particular σ 2 ) ≥ 2 τ 2 = 2 σ 4 Var(ˆ n n 143

  10. Notice: ( n − 1) s 2 σ 2 � � Var( s 2 ) = Var σ 2 n − 1 σ 4 ( n − 1) 2 Var( χ 2 = n − 1 ) = 2 σ 4 n − 1 > 2 σ 4 n Conclusions: Variance of sample variance larger than lower bound. But ratio of variance to lower bound is n/ ( n − 1) which is nearly 1. Fact: s 2 is UMVUE anyway; see later in course. 144

  11. Slightly more general. If E( T ) = φ ( θ ) for some ftn φ then similar argument gives: � 2 � φ ′ ( θ ) Var( T ) ≥ I ( θ ) Inequality is strict unless corr = 1 so that U ( θ ) = A ( θ ) T ( X ) + B ( θ ) for non-random constants A and B (may de- pend on θ .) This would prove that ℓ ( θ ) = A ∗ ( θ ) T ( X ) + B ∗ ( θ ) + C ( X ) for other constants A ∗ and B ∗ and finally f ( x, θ ) = h ( x ) e A ∗ ( θ ) T ( x )+ B ∗ ( θ ) for h = e C . 145

  12. Summary of Implications • You can recognize a UMVUE sometimes. If Var θ ( T ( X )) ≡ 1 /I ( θ ) then T ( X ) is the UMVUE. In the N ( µ, 1) example the Fisher information is n and Var( X ) = 1 /n so that X is the UMVUE of µ . • In an asymptotic sense the MLE is nearly optimal: it is nearly unbiased and (approx- imate) variance nearly 1 /I ( θ ). • Good estimates are highly correlated with the score. • Densities of exponential form (called expo- nential family ) given above are somehow special. • Usually inequality is strict — strict unless score is affine function of a statistic T and T (or T/c for constant c ) is unbiased for θ . 146

  13. What can we do to find UMVUEs when the CRLB is a strict inequality? Use Sufficiency: choose good summary statistics. Completeness: recognize unique good statis- tics. Rao-Blackwell theorem: mechanical way to im- prove unbiased estimates. Lehman-Scheff´ e theorem: way to prove esti- mate is UMVUE. 147

  14. Sufficiency Example : Suppose X 1 , . . . , X n iid Bernoulli( θ ) and � T ( X 1 , . . . , X n ) = X i Consider conditional distribution of X 1 , . . . , X n given T . Take n = 4. P ( X 1 = 1 , X 2 = 1 , X 3 = 0 , X 4 = 0 | T = 2) = P ( X 1 = 1 , X 2 = 1 , X 3 = 0 , X 4 = 0 , T = 2) P ( T = 2) = P ( X 1 = 1 , X 2 = 1 , X 3 = 0 , X 4 = 0) P ( T = 2) p 2 (1 − p ) 2 = � 4 � p 2 (1 − p ) 2 2 = 1 6 Notice disappearance of p ! This happens for all possibilities for n , T and the X s. 148

  15. In the binomial situation we say the condi- tional distribution of the data given the sum- mary statistic T is free of θ . Defn : Statistic T ( X ) is sufficient for the model { P θ ; θ ∈ Θ } if conditional distribution of data X given T = t is free of θ . Intuition : Data tell us about θ if different val- ues of θ give different distributions to X . If two different values of θ correspond to same den- sity or cdf for X we cannot distinguish these two values of θ by examining X . Extension of this notion: if two values of θ give same condi- tional distribution of X given T then observing T in addition to X doesn’t improve our ability to distinguish the two values. 149

  16. Theorem : [Rao-Blackwell] Suppose S ( X ) is a sufficient statistic for model { P θ , θ ∈ Θ } . If T is an estimate of φ ( θ ) then: 1. E ( T | S ) is a statistic. 2. E ( T | S ) has the same bias as T ; if T is un- biased so is E ( T | S ). 3. Var θ ( E ( T | S )) ≤ Var θ ( T ) and the inequality is strict unless T is a function of S . 4. MSE of E ( T | S ) is no more than MSE of T . 150

  17. Usage : Review conditional distributions: Defn : if X , Y are rvs with joint density then � E ( g ( Y ) | X = x ) = g ( y ) f Y | X ( y | x ) dy and E ( Y | X ) is this function of x evaluated at X Important property: � E { R ( X ) E ( Y | X ) } = R ( x ) E ( Y | X = x ) f X ( x ) dx �� = R ( x ) yf X ( x ) f ( y | x ) dydx �� = R ( x ) yf X,Y ( x, y ) dydx = E ( R ( X ) Y ) Think of E ( Y | X ) as average Y holding X fixed. Behaves like ordinary expected value but func- tions of X only are like constants: � � E ( A i ( X ) Y i | X ) = A i ( X ) E ( Y i | X ) 151

  18. Examples : In the binomial problem Y 1 (1 − Y 2 ) is an unbi- ased estimate of p (1 − p ). We improve this by computing E ( Y 1 (1 − Y 2 ) | X ) We do this in two steps. First compute E ( Y 1 (1 − Y 2 ) | X = x ) 152

  19. Notice that the random variable Y 1 (1 − Y 2 ) is either 1 or 0 so its expected value is just the probability it is equal to 1: E ( Y 1 (1 − Y 2 ) | X = x ) = P ( Y 1 (1 − Y 2 ) = 1 | X = x ) = P ( Y 1 = 1 , Y 2 = 0 | Y 1 + Y 2 + · · · + Y n = x ) P ( Y 1 = 1 , Y 2 = 0 , Y 1 + · · · + Y n = x ) = P ( Y 1 + Y 2 + · · · + Y n = x ) P ( Y 1 = 1 , Y 2 = 0 , Y 3 + · · · + Y n = x − 1) = � n � p x (1 − p ) n − x x � n − 2 � p x − 1 (1 − p ) ( n − 2) − ( x − 1) p (1 − p ) x − 1 = � n � p x (1 − p ) n − x x � n − 2 � x − 1 = � n � x x ( n − x ) = n ( n − 1) This is simply n ˆ p (1 − ˆ p ) / ( n − 1) (can be bigger than 1 / 4, the maximum value of p (1 − p )). 153

  20. Example : If X 1 , . . . , X n are iid N ( µ, 1) then ¯ X is sufficient and X 1 is an unbiased estimate of µ . Now E ( X 1 | ¯ X ) = E [ X 1 − ¯ X + ¯ X | ¯ X ] = E [ X 1 − ¯ X | ¯ X ] + ¯ X = ¯ X which is (later) the UMVUE. 154

Recommend


More recommend