Lecture 23: How to find estimators §6.2 0/ 29
We have been discussing the problem of estimating on unknown parameter θ in a probability distribution if we are given a sample x 1 , x 2 , . . . , x n from that distribution. We introduced two examples. Use the sample mean x = x 1 + . . . + x n to estimate population mean µ . X is an n unbiased estimator of µ . 1/ 29 Lecture 23: How to find estimators §6.2
Also we had the more subtle problem of estimators B in U ( 0 , B ) W = n + 1 max ( x 1 , x 2 , . . . , x n ) n is an unbiased estimators of θ = B . We discussed two desirable properties of estimators (i) unbiased (ii) minimum variance 2/ 29 Lecture 23: How to find estimators §6.2
the general problems. Given How do you find an estimator ˆ θ = h ( x 1 , x 2 , . . . , x n ) for θ ? There are two methods. (i) The method of moments (ii) The method of maximum likelihood. 3/ 29 Lecture 23: How to find estimators §6.2
The Method of Moments Definition 1 Let k be a non negative integer and X be a random variable. Then the k -th moment m k ( x ) of X is given by m k ( X ) = E ( X k ) , k ≥ 0 so m 0 ( X ) = 1 m 1 ( X ) = E ( X ) = µ m 2 ( X ) = E ( X 2 ) = σ 2 + µ 2 Definition 2 Let x 1 , x 2 , . . . , x n be a sample from X . Then the k -th sample moment S k is � n S k = 1 x k i , so S 1 = x n 1 = 1 4/ 29 Lecture 23: How to find estimators §6.2
Key Point Given the k -th moment m k (X) ( k -th population moment) depends on θ whereas the k -th sample moment does not - it is just the average sum of powers of the x ’s. The method of moments says (i) Equate the k -the population moment m k ( X ) to the k -th sample moment S k . (ii) Solve the resulting system of equations for θ . 5/ 29 Lecture 23: How to find estimators §6.2
( ∗ ) m k ( X ) = S k , 1 ≤ k < ∞ We will denote the answer by ˆ θ mme Example 1 Estimating P in a Bernoulli distribution The first population moment m 1 ( X ) is the near E ( X ) = p = θ The first sample moment S 1 is the sample mean so looking at the first equation of ( ∗ ) m 1 ( X ) = S 1 so p = x gives us the sample mean as an estimator for p 6/ 29 Lecture 23: How to find estimators §6.2
Example 1 (Cont.) Recall that because the x ’s are all either 1 or zero x 1 + . . . + x n = � of successes and x = # ofsuccesses n = the sample proportion p mme = X ˆ Example 2 The method of moments works well when you here several unknown parameters. Suppose we want to estimate both the mean µ and the variance σ 2 from a normal distribution (or any distribution) X ∼ N ( µ, σ 2 ) 7/ 29 Lecture 23: How to find estimators §6.2
Example 2 (Cont.) We equate the first two population moments to the first two sample moments m 1 ( X ) = S 1 m 2 ( X ) = S 2 so µ = X � n σ 2 + µ 2 = 1 x 2 i n i = 1 Solving (we get µ for free, ˆ µ mme = X ) � n σ 2 = 1 X 2 i − µ 2 n i = 1 �� X i � 2 � n = 1 X 2 i − n n i = 1 � n � = 1 i − 1 X 2 X i ) 2 n ( n i = 1 8/ 29 Lecture 23: How to find estimators §6.2
Example 2 (Cont.) So �� i − ( � X i ) 2 � σ 2 mme = 1 � X 2 n n Actually the best estimator for σ 2 is the sample variance i − ( � x i ) 2 � n 1 S 2 = X 2 n − 1 n i = 1 � σ 2 mme is a biased estimator. Example 3 Estimating B in U ( 0 , B ) Recall that we come up with the unbiased estimator B = n + 1 � max ( x 2 , x 2 , . . . , x n ) n Put w = max ( x 1 , . . . , x n + 1 ) 9/ 29 Lecture 23: How to find estimators §6.2
What do we get from the Method of Moments ? Then E ( X ) = 0 + B = B 2 2 So equating the first population moment m 1 ( X ) = µ to the first sample moment S 1 = x we get B 2 = x B = 2 x and ˆ B mme = 2 X so This is unbiased because E ( X ) = population mean = B 2 so E ( 2 X ) = B 10/ 29 Lecture 23: How to find estimators §6.2
So we have a new unbiased estimator B 1 = ˆ ˆ B mme = 2 X . Recall the other was B 2 = n + 1 ˆ W n where W = Max ( X 1 , . . . , X n ) Which one is better? We will interpret this to mean “which one has the smaller variance”? 11/ 29 Lecture 23: How to find estimators §6.2
V (ˆ B 1 ) = V ( 2 X ) Recall from the Distribution Hard out that X ∼ U ( A , B ) ⇒ V ( X ) = ( B − A ) 2 12 Now X ∼ U ( 0 , B ) so V ( X ) = B 2 12 This is the population variance. We also know V ( X ) = σ 2 n = population variance n V ( X ) = B 2 so 12 n B 1 ) = V ( 2 X ) = 4 B 2 12 n = B 2 V ( ˆ Then 3 n 12/ 29 Lecture 23: How to find estimators §6.2
� n + 1 � V ( B 2 ) = V Max ( X 1 , . . . , X n ) n We have W = Max ( X 1 , X 2 , . . . , X n ) We have from Problem 32, pg 252 n E ( W ) = n + 1 B nw n − 1 0 ≤ w ≤ B , and f W ( w ) = B n 0 , otherwise Hence � B � B w 2 nw n − 1 dw = n E ( W 2 ) = w n + 1 dw B n B n 0 0 �� � W n + 2 � w = B = n � n � n + 2 B 2 = � � B n n + 2 w = 0 13/ 29 Lecture 23: How to find estimators §6.2
Hence V ( W ) = E ( W 2 ) − E ( W ) 2 � � 2 n n n + 2 B 2 − = n + 1 B � � n 2 n = B 2 n + 2 − ( n + 1 ) 2 � n ( n + 1 ) 2 − n 2 ( n + 2 ) � = B 2 ( n + 1 ) 2 ( n + 2 ) � n 3 + zn 2 + n − n 3 − 2 n 2 � = B 2 ( n + 1 ) 2 ( n + 2 ) n ( n + 1 ) 2 ( n + 2 ) B 2 = � n + 1 � = ( n + 1 ) 2 V (ˆ B 2 ) = V V ( W ) W n n 2 ( n + 1 ) 2 = ✘✘✘✘ ✘ n 1 ( n + 1 ) 2 ( n + 2 ) B 2 = n ( n + 2 ) B 2 n 2 ✘ ✘✘✘ 14/ 29 Lecture 23: How to find estimators §6.2
B 2 is the winner because n ≥ 1. If n = 1 they tie but of course n >> 1 so ˆ ˆ B 2 is a lot better. 15/ 29 Lecture 23: How to find estimators §6.2
The Method of Maximum Likelihood (a brilliant idea) Suppose we have an actual sample x 1 , x 2 , . . . , x n from the space of a discrete random variable x whose proof p X ( x , θ ) depends on an unknown parameter θ . What is the probability P of getting the sample x 1 , x 2 , . . . , x n that we actually obtained. It is P ( X 1 = x 1 , X 2 = x 2 , . . . , X n = x n ) by independence = P ( X 1 = x 1 ) P ( X 2 = x 2 ) . . . P ( X n = x n ) 16/ 29 Lecture 23: How to find estimators §6.2
But since X 1 , X 2 , . . . , X n are samples from X they have the sample proof’s as X so P ( X 1 = x 1 ) = P ( X = x 1 ) = P X ( x 1 , θ ) P ( X 2 = x 2 ) = P ( X = x 2 ) = P X ( x 2 , θ ) . . . P ( X n = x n ) = P ( X = x n ) = P X ( x n , θ ) Hence P = p X ( x 1 , θ ) p X ( x 2 , θ ) . . . p X ( x n , θ ) P is a function of θ , it is called the likelihood function and denoted L θ -it is the likelihood of getting the sample we actually obtained. 17/ 29 Lecture 23: How to find estimators §6.2
Note, θ is unknown but x 1 , x 2 , . . . , x n are known (given). So what is the nest guess for θ - the number that maximizes the probability of getting the sample use actually observed. This is the value of θ that is most compatible with the observed data . Bottom Line Find the value of θ that maximizes the likelihood function L ( θ ) This is the “method of maximum likelihood”. 18/ 29 Lecture 23: How to find estimators §6.2
The resulting estimator will be called the maximum likelihood estimator, abbreviated mle and denoted ˆ θ mle . Remark (We will be lazy) In doing problems, following the text, we won’t really maximize L ( θ ) we will just find a critical point of L ( θ ) ie. a point where L ′ ( θ ) is zero. Later in your cancer if your have to do this you should check that the critical point is indeed a maximum . 19/ 29 Lecture 23: How to find estimators §6.2
Examples 1. The mle for p in Bin ( 1 , p ) x 0 1 X ∼ Bin ( 1 , p ) means the proof of X is p (X=x) 1 − p P There is a simple formula for this p X ( x ) = p x ( 1 − p ) 1 − x , x = 0 , 1 Now since p is our unknown parameter θ we write p X ( x , θ ) = θ x ( 1 − θ ) 1 − x , x = 0 , 1 so p X ( x , θ ) = θ x 1 ( 1 − θ ) 1 − x 1 . . . p X ( x n , θ ) = θ x n ( 1 − θ ) 1 − x n 20/ 29 Lecture 23: How to find estimators §6.2
Hence L ( θ ) = p X ( x 1 , θ ) . . . p X ( x n , θ ) and hence L ( θ ) = θ x 1 ( 1 − θ ) 1 − x 1 θ x 2 ( 1 − θ ) 1 − x 2 . . . θ x n ( 1 − θ ) 1 − x n � ������������������������������������������������������� �� ������������������������������������������������������� � positive number Now we want to 1. Compute L ′ ( θ ) 2. Set L ′ ( θ ) = 0 and solve for ( ∗ ) θ in terms of x 1 , x 2 , . . . , x n We can make things much simpler by using the following trick. Suppose f ( x ) is a real valued function that only takes positive value. Put h ( x ) = ln f ( x ) 21/ 29 Lecture 23: How to find estimators §6.2
So the critical points of h are the same points as those of f h 1 ( x ) = 0 ⇔ f ′ ( x ) f ( x ) = 0 ⇔ f ′ ( x ) = 0 Also h takes a maximum value of x ∗ ⇔ f takes a maximum value at x ∗ . This is because ln is an increasing function so it preserves order relations. ( a < b ⇔ ln a < ln b , have we assume a > 0 and b > 0) Bottom Line Change ( ∗ ) to ( ∗∗ ) 22/ 29 Lecture 23: How to find estimators §6.2
Recommend
More recommend