machine learning
play

Machine Learning Estimation Hamid R. Rabiee Spring 2015 - PowerPoint PPT Presentation

Machine Learning Estimation Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses//93-94/2/ce717-1/ Agenda Agenda Introduction Maximum Likelihood Estimation Maximum A Posteriori Estimation Bayesian Estimators Sharif


  1. Machine Learning Estimation Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses//93-94/2/ce717-1/

  2. Agenda Agenda  Introduction  Maximum Likelihood Estimation  Maximum A Posteriori Estimation  Bayesian Estimators Sharif University of Technology, Computer Engineering Department, Machine Learning Course 2

  3. Densi Density ty Esti Estimati mation on  Model the probability distribution p(x) of a random variable x, given a finite set x1, . . . , xN of observations.  The good estimator is:  Unbiased: Sampling distribution of the estimator centers around the parameter value  Efficient: Smallest possible standard error, compared to other estimators  Methods for parameter estimation  Maximum Likelihood Estimation (MLE)  Maximum A Posteriori estimation (MAP) Sharif University of Technology, Computer Engineering Department, Machine Learning Course 3

  4. Likel Likelihood ihood Func Functi tion on  Consider n independent observations of x : x 1 , ..., x n , where x follows f ( x ; q ). The joint pdf for the whole data sample is: Now evaluate this function with the data sample obtained and regard it as a function of the parameter(s). This is the likelihood function: ( x i constant) Sharif University of Technology, Computer Engineering Department, Machine Learning Course 4

  5. Maxim Maximum um Likel Likelihood ihood Esti Estimati mation ( on (MLE) MLE)  Likelihood function:  For each sample point 𝑦 , let 𝜾(𝒚) be the parameter value at which 𝑀 ( 𝜄 | 𝑦 ) attains its maximum as a function of 𝜄 .  The MLE estimator of 𝜄 based on a sample 𝑦 is 𝜾(𝒚) .  The MLE is the parameter point for which the observed sample is more likely. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 5

  6. Maxim Maximum um Likel Likelihood ihood Esti Estimati mation ( on (MLE) MLE)  If the likelihood function is differentiable (in 𝜄𝑗 ), possible conditions for the MLE are the values ( 𝜄 1 ,…, 𝜄𝑙 ) that solve:  Note that the solutions are possible candidates. To find exact MLE we should check Sharif University of Technology, Computer Engineering Department, Machine Learning Course 6

  7. Exa Exampl mple e 1 Adopted from slides of Harvard university Sharif University of Technology, Computer Engineering Department, Machine Learning Course 7

  8. Exa Exampl mple e 2  MLE for Gaussian with unknown mean Let 𝑦 1, 𝑦 2 ,…, 𝑦𝑜 be iid samples from 𝑂 ( 𝜄 ,1) . Find and MLE of 𝜄 .   Solution: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 8

  9. Maxim Maximum um Likel Likelihood ihood Esti Estimati mation ( on (MLE) MLE)  Sometimes it’s more convenient to use log likelihood .  Let 𝑦 1, 𝑦 2 ,…, 𝑦𝑜 be iid samples from Bernouli ( 𝑞 ), then the likelihood function is:  If 𝜄 is the MLE, then for any function 𝜐 ( 𝜄 ) the MLE of 𝜐 ( 𝜄 ) is 𝜐 ( 𝜄 ). Sharif University of Technology, Computer Engineering Department, Machine Learning Course 9

  10. Exa Exampl mple e 3  MLE for Gaussian with unknown mean and variance  Let 𝒚 𝟐 , 𝒚 𝟑 , … , 𝒚 𝑺 be iid samples from 𝑶(𝝂, 𝝉 𝟑 ). Find the MLE for 𝜾 = (𝝂, 𝝉 𝟑 )  Solution:  Prove that MLE for the variance of a Gaussian is biased! Sharif University of Technology, Computer Engineering Department, Machine Learning Course 10

  11. Property Property of of MLE MLE  To use two-variate calculus to verify that a function 𝐼 ( 𝜄 1, 𝜄 2) has a maximum at 𝜄 1, 𝜄 2, it must be shown that the following three conditions hold: a) First order partial derivatives are zero: b) At least one second order partial derivative is negative: c) The Jacobian of second order derivatives is positive: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 11

  12. Exa Exampl mple e 4  MLE for Multinomial distribution (Hint: use Lagrange multipliers)  Solution: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 12

  13. MLE: MLE: Mul Multi tinomial nomial d dis istr tributi ibution on  To use two-variate calculus to verify that a function 𝐼 ( 𝜄 1, 𝜄 2) has a maximum at 𝜄 1, 𝜄 2, it must be shown that the following three conditions hold: a) First order partial derivatives are zero: b) At least one second order partial derivative is negative: c) The Jacobian of second order derivatives is positive: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 13

  14. Exa Exampl mple e 5  MLE for uniform distribution 𝑽 𝟏, 𝜾  Solution: Indicator function Sharif University of Technology, Computer Engineering Department, Machine Learning Course 14

  15. Maxim Maximum um A Posteri A Posteriori ori Estim Estimation ation  Approximation:  Instead of averaging over all parameter values  Consider only the most probable value (i.e., value with highest posterior probability)  Usually a very good approximation, and much simpler  MAP value ≠ Expected value  MAP → ML for infinite data (as long as prior ≠ 0 everywhere)  Given a set of observations 𝒠 and a prior distribution on parameters, the parameter vector that maximizes 𝑞 ( 𝒠 | 𝜾 ) 𝑞 ( 𝜾 ) is found. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 15

  16. Maxim Maximum um A Posteri A Posteriori ori Estim Estimation ation  Priors:  Uninformative priors: Uniform distribution  Conjugate priors: Closed-form representation of posterior, P(q) and P(q|D) have the same form Distribution Conjugate prior Binomial Beta Multinomial Dirichlet Sharif University of Technology, Computer Engineering Department, Machine Learning Course 16

  17. MA MAP P VS. VS. M MLE LE Adopted from slides of A. Zisserman Sharif University of Technology, Computer Engineering Department, Machine Learning Course 17

  18. MA MAP P VS. VS. M MLE LE  MLE: Choose value that maximizes the probability of observed data: -Suffer from overfitting  MAP: Choose value that is most probable given observed data and prior belief - Can avoid overfitting  When MAP and MLE are the same? Sharif University of Technology, Computer Engineering Department, Machine Learning Course 18

  19. Exa Exampl mple e 6  MAP for Gaussian with unknown mean and having prior 𝟑 ) Find  Let 𝒚 𝟐 , 𝒚 𝟑 , … , 𝒚 𝑶 be iid samples from 𝑶(𝝂, 𝝉 𝟑 ) and prior 𝑶(𝝂 0 , 𝝉 0 the MAP for 𝝂  Solution: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 19

  20. Bayes Estim Bayes Estimators ators  Suppose that we have a prior distribution for 𝜄 : 𝜌 ( 𝜄 )  Let 𝑔 ( 𝑦 | 𝜄 ) be the sampling distribution, then conditional distribution of 𝜄 given the sample 𝑦 is: where 𝑛 ( 𝑦 ) is the marginal distribution of 𝑦 : Sharif University of Technology, Computer Engineering Department, Machine Learning Course 20

  21. Exa Exampl mple e 7  Estimation for Gaussian with unknown mean and having prior 𝟑 ) 𝒃𝒐𝒆 𝜾 ~ 𝑶 ( 𝝂, 𝝉 𝟑 )  Let N iid samples from 𝒚 𝒖 ~ 𝑶 (𝜾, 𝝉 𝟏  Solution: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 21

  22. Bayesi Bayesian an Es Estim timato ators rs  Both ML and MAP return only single and specific values for the parameter Θ. Bayesian estimation, by contrast, calculates fully the posterior distribution Prob (Θ|X).  If: prior is well-behaved (i.e., does not assign 0 density to any “feasible” parameter value). Then: both MLE and Bayesian prediction converge to the same value as the number of training data increases. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 22

  23. Any Q Any Questi uestion on End of Lecture 2 Thank you! Spring 2015 http://ce.sharif.edu/courses//93-94/2/ce717-1/ Sharif University of Technology, Computer Engineering Department, Machine Learning Course 23

Recommend


More recommend