chapter 8 3 maximum likelihood estimation
play

Chapter 8.3. Maximum Likelihood Estimation Prof. Tesler Math 283 - PowerPoint PPT Presentation

Chapter 8.3. Maximum Likelihood Estimation Prof. Tesler Math 283 Fall 2019 Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 1 / 11 Estimating parameters Let Y be a random variable with a distribution of known type but


  1. Chapter 8.3. Maximum Likelihood Estimation Prof. Tesler Math 283 Fall 2019 Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 1 / 11

  2. Estimating parameters Let Y be a random variable with a distribution of known type but unknown parameter value θ . Bernoulli or geometric with unknown p . Poisson with unknown mean µ . Denote the pdf of Y by P Y ( y ; θ ) to emphasize that there is a parameter θ . Do n independent trials to get data y 1 , y 2 , y 3 , . . . , y n . The joint pdf is P Y 1 ,..., Y n ( y 1 , . . . , y n ; θ ) = P Y ( y 1 ; θ ) · · · P Y ( y n ; θ ) Goal: Use the data to estimate θ . Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 2 / 11

  3. Likelihood function Previously, we knew the parameter θ and regarded the y ’s as unknowns (occurring with certain probabilities). Define the likelihood of θ given data y 1 , . . . , y n to be L ( θ ; y 1 , . . . , y n ) = P Y 1 ,..., Y n ( y 1 , . . . , y n ; θ ) = P Y ( y 1 ; θ ) · · · P Y ( y n ; θ ) It’s the exact same formula as the joint pdf; the difference is the interpretation. Now the data y 1 , . . . , y n is given while θ is unknown. Definition (Maximum Likelihood Estimate, or MLE) The value θ = � θ that maximizes L is the Maximum Likelihood Estimate . Often, it is found using Calculus by locating a critical point: d 2 L dL d θ = 0 d θ 2 < 0 However, be sure to check for complications such as discontinuities and boundary values of θ . Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 3 / 11

  4. MLE for the Poisson distribution Y has a Poisson distribution with unknown parameter µ � 0 . Collect data from independent trials: Y 1 = y 1 , Y 2 = y 2 , · · · , Y n = y n Likelihood: n e − µ µ y i y i ! = e − n µ µ y 1 + ··· + y n � L ( µ ; y 1 , . . . , y n ) = y 1 ! · · · y n ! i = 1 Log likelihood is maximized at the same µ and is easier to use: ln L ( µ ; y 1 , . . . , y n ) = − n µ + ( y 1 + · · · + y n ) ln µ − ln ( y 1 ! · · · y n ! ) Critical point: Solve d ( ln L ) / d µ = 0 : d ( ln L ) = − n + y 1 + · · · + y n µ = y 1 + · · · + y n = 0 so d µ µ n Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 4 / 11

  5. MLE for the Poisson distribution Log likelihood is maximized at the same µ and is easier to use: ln L ( µ ; y 1 , . . . , y n ) = − n µ + ( y 1 + · · · + y n ) ln µ − ln ( y 1 ! · · · y n ! ) Critical point: Solve d ( ln L ) / d µ = 0 : d ( ln L ) = − n + y 1 + · · · + y n µ = y 1 + · · · + y n = 0 so d µ µ n Check second derivative is negative: d 2 ( ln L ) n 2 = − y 1 + · · · + y n = − < 0 d µ 2 µ 2 y 1 + · · · + y n provided y 1 + · · · + y n > 0 . So it’s a max unless y 1 + · · · + y n = 0 . Boundaries for range µ ≥ 0 : Must check µ → 0 + and µ → ∞ . Both send ln L → − ∞ , so the µ identified above gives the max. The Maximum Likelihood Estimate for the Poisson distribution µ = y 1 + · · · + y n = 0 (# of 0’s) + 1 (# of 1’s) + 2 (# of 2’s) + · · · ˆ n n Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 5 / 11

  6. MLE for the Poisson distribution The exceptional case on the previous slide was y 1 + · · · + y n = 0 , giving y 1 = · · · = y n = 0 (since all y i � 0 ). In this case, ln L ( µ ; y 1 , . . . , y n ) = − n µ + ( y 1 + · · · + y n ) ln µ − ln ( y 1 ! · · · y n ! ) = − n µ + 0 ln µ − ln ( 0 ! · · · 0 ! ) = − n µ On the range µ � 0 , this is maximized at ˆ µ = 0 , which agrees with the main formula: µ = y 1 + · · · + y n = 0 + · · · + 0 ˆ = 0 n n Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 6 / 11

  7. Repeating the estimation gives different results Scenario: In a lab class, each student does 10 trials of an experiment and averages them. How do their results compare? A does n trials y A 1 , y A 2 , . . . , y An , leading to MLE � θ A , B does n trials y B 1 , y B 2 , . . . , y Bn , leading to MLE � θ B , etc. How do � θ A , � θ B , . . . compare? Treat the n trials in each experiment as random variables Y 1 , . . . , Y n and the MLE as a random variable � Θ . Estimate Poisson parameter with n = 10 trials (secret: µ = 1 . 23 ) � Experiment Θ Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y 8 Y 9 Y 10 A 1 0 0 0 3 0 2 2 0 2 1.0 B 1 2 0 1 1 3 0 0 0 1 0.9 C 3 2 2 1 1 1 1 2 1 1 1.5 D 1 2 1 2 1 4 2 3 2 1 1.9 E 0 3 0 1 1 0 0 1 2 2 1.0 Mean 1.2 1.8 0.6 1 1.4 1.6 1 1.6 1 1.4 1.26 Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 7 / 11

  8. Desireable properties of an estimator � Θ � Θ should be narrowly distributed around the correct value of θ . Increasing n should improve the estimate. The distribution of � Θ should be known. The MLE often does this (though not always!). Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 8 / 11

  9. Bias Suppose Y is Poisson with secret parameter µ . Poisson MLE from data is µ = Y 1 + · · · + Y n ˆ n If many MLEs are computed from independent data sets, the average tends to � Y 1 + · · · + Y n � = E ( Y 1 ) + · · · + E ( Y n ) E ( ˆ µ ) = E n n = µ + · · · + µ = n µ n = µ n Since E ( ˆ µ ) = µ , we say ˆ µ is an unbiased estimator of µ . Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 9 / 11

  10. Bias If E ( ˆ µ ) = µ , then ˆ µ is an unbiased estimator of µ . But if E ( ˆ µ ) � µ , then ˆ µ is a biased estimator of µ . µ ′ = 2 Y 1 has E ( ˆ Contrived example: Estimator ˆ µ ′ ) = 2 µ , so it’s biased (unless µ = 0 ). We will soon see an example (normal distribution) where the MLE gives a biased estimator. Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 10 / 11

  11. Efficiency (want estimates to have small spread) Increasing n µ = Y 1 + ··· + Y n Continue with Poisson MLE ˆ and secret mean µ . n The variance is � Y 1 + · · · + Y n � = Var ( Y 1 ) + · · · + Var ( Y n ) Var ( ˆ µ ) = Var n 2 n = n Var ( Y 1 ) = Var ( Y 1 ) = µ n 2 n n Increasing n makes the variance smaller (ˆ µ is more efficient ). Another estimator µ ′ = Y 1 + 2 Y 2 Set ˆ (and ignore Y 3 , . . . , Y n ). 3 µ ′ ) = µ + 2 µ E ( ˆ = µ so unbiased 3 µ ′ ) = Var ( Y 1 ) + 4 Var ( Y 2 ) = µ + 4 µ = 5 µ Var ( ˆ 9 9 9 so it has higher variance (less efficient) than the MLE. Prof. Tesler 8.3 Maximum Likeilihood Estimation Math 283 / Fall 2019 11 / 11

Recommend


More recommend