Machine Learning Estimation Hamid R. Rabiee Spring 2015 - PowerPoint PPT Presentation

Machine Learning Estimation Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses//93-94/2/ce717-1/

Agenda Agenda  Introduction  Maximum Likelihood Estimation  Maximum A Posteriori Estimation  Bayesian Estimators Sharif University of Technology, Computer Engineering Department, Machine Learning Course 2

Densi Density ty Esti Estimati mation on  Model the probability distribution p(x) of a random variable x, given a finite set x1, . . . , xN of observations.  The good estimator is:  Unbiased: Sampling distribution of the estimator centers around the parameter value  Efficient: Smallest possible standard error, compared to other estimators  Methods for parameter estimation  Maximum Likelihood Estimation (MLE)  Maximum A Posteriori estimation (MAP) Sharif University of Technology, Computer Engineering Department, Machine Learning Course 3

Likel Likelihood ihood Func Functi tion on  Consider n independent observations of x : x 1 , ..., x n , where x follows f ( x ; q ). The joint pdf for the whole data sample is: Now evaluate this function with the data sample obtained and regard it as a function of the parameter(s). This is the likelihood function: ( x i constant) Sharif University of Technology, Computer Engineering Department, Machine Learning Course 4

Maxim Maximum um Likel Likelihood ihood Esti Estimati mation ( on (MLE) MLE)  Likelihood function:  For each sample point 𝑦 , let 𝜾(𝒚) be the parameter value at which 𝑀 ( 𝜄 | 𝑦 ) attains its maximum as a function of 𝜄 .  The MLE estimator of 𝜄 based on a sample 𝑦 is 𝜾(𝒚) .  The MLE is the parameter point for which the observed sample is more likely. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 5

Maxim Maximum um Likel Likelihood ihood Esti Estimati mation ( on (MLE) MLE)  If the likelihood function is differentiable (in 𝜄𝑗 ), possible conditions for the MLE are the values ( 𝜄 1 ,…, 𝜄𝑙 ) that solve:  Note that the solutions are possible candidates. To find exact MLE we should check Sharif University of Technology, Computer Engineering Department, Machine Learning Course 6

Exa Exampl mple e 1 Adopted from slides of Harvard university Sharif University of Technology, Computer Engineering Department, Machine Learning Course 7

Exa Exampl mple e 2  MLE for Gaussian with unknown mean Let 𝑦 1, 𝑦 2 ,…, 𝑦𝑜 be iid samples from 𝑂 ( 𝜄 ,1) . Find and MLE of 𝜄 .   Solution: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 8

Maxim Maximum um Likel Likelihood ihood Esti Estimati mation ( on (MLE) MLE)  Sometimes it’s more convenient to use log likelihood .  Let 𝑦 1, 𝑦 2 ,…, 𝑦𝑜 be iid samples from Bernouli ( 𝑞 ), then the likelihood function is:  If 𝜄 is the MLE, then for any function 𝜐 ( 𝜄 ) the MLE of 𝜐 ( 𝜄 ) is 𝜐 ( 𝜄 ). Sharif University of Technology, Computer Engineering Department, Machine Learning Course 9

Exa Exampl mple e 3  MLE for Gaussian with unknown mean and variance  Let 𝒚 𝟐 , 𝒚 𝟑 , … , 𝒚 𝑺 be iid samples from 𝑶(𝝂, 𝝉 𝟑 ). Find the MLE for 𝜾 = (𝝂, 𝝉 𝟑 )  Solution:  Prove that MLE for the variance of a Gaussian is biased! Sharif University of Technology, Computer Engineering Department, Machine Learning Course 10

Property Property of of MLE MLE  To use two-variate calculus to verify that a function 𝐼 ( 𝜄 1, 𝜄 2) has a maximum at 𝜄 1, 𝜄 2, it must be shown that the following three conditions hold: a) First order partial derivatives are zero: b) At least one second order partial derivative is negative: c) The Jacobian of second order derivatives is positive: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 11

Exa Exampl mple e 4  MLE for Multinomial distribution (Hint: use Lagrange multipliers)  Solution: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 12

MLE: MLE: Mul Multi tinomial nomial d dis istr tributi ibution on  To use two-variate calculus to verify that a function 𝐼 ( 𝜄 1, 𝜄 2) has a maximum at 𝜄 1, 𝜄 2, it must be shown that the following three conditions hold: a) First order partial derivatives are zero: b) At least one second order partial derivative is negative: c) The Jacobian of second order derivatives is positive: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 13

Exa Exampl mple e 5  MLE for uniform distribution 𝑽 𝟏, 𝜾  Solution: Indicator function Sharif University of Technology, Computer Engineering Department, Machine Learning Course 14

Maxim Maximum um A Posteri A Posteriori ori Estim Estimation ation  Approximation:  Instead of averaging over all parameter values  Consider only the most probable value (i.e., value with highest posterior probability)  Usually a very good approximation, and much simpler  MAP value ≠ Expected value  MAP → ML for infinite data (as long as prior ≠ 0 everywhere)  Given a set of observations 𝒠 and a prior distribution on parameters, the parameter vector that maximizes 𝑞 ( 𝒠 | 𝜾 ) 𝑞 ( 𝜾 ) is found. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 15

Maxim Maximum um A Posteri A Posteriori ori Estim Estimation ation  Priors:  Uninformative priors: Uniform distribution  Conjugate priors: Closed-form representation of posterior, P(q) and P(q|D) have the same form Distribution Conjugate prior Binomial Beta Multinomial Dirichlet Sharif University of Technology, Computer Engineering Department, Machine Learning Course 16

MA MAP P VS. VS. M MLE LE Adopted from slides of A. Zisserman Sharif University of Technology, Computer Engineering Department, Machine Learning Course 17

MA MAP P VS. VS. M MLE LE  MLE: Choose value that maximizes the probability of observed data: -Suffer from overfitting  MAP: Choose value that is most probable given observed data and prior belief - Can avoid overfitting  When MAP and MLE are the same? Sharif University of Technology, Computer Engineering Department, Machine Learning Course 18

Exa Exampl mple e 6  MAP for Gaussian with unknown mean and having prior 𝟑 ) Find  Let 𝒚 𝟐 , 𝒚 𝟑 , … , 𝒚 𝑶 be iid samples from 𝑶(𝝂, 𝝉 𝟑 ) and prior 𝑶(𝝂 0 , 𝝉 0 the MAP for 𝝂  Solution: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 19

Bayes Estim Bayes Estimators ators  Suppose that we have a prior distribution for 𝜄 : 𝜌 ( 𝜄 )  Let 𝑔 ( 𝑦 | 𝜄 ) be the sampling distribution, then conditional distribution of 𝜄 given the sample 𝑦 is: where 𝑛 ( 𝑦 ) is the marginal distribution of 𝑦 : Sharif University of Technology, Computer Engineering Department, Machine Learning Course 20

Exa Exampl mple e 7  Estimation for Gaussian with unknown mean and having prior 𝟑 ) 𝒃𝒐𝒆 𝜾 ~ 𝑶 ( 𝝂, 𝝉 𝟑 )  Let N iid samples from 𝒚 𝒖 ~ 𝑶 (𝜾, 𝝉 𝟏  Solution: Sharif University of Technology, Computer Engineering Department, Machine Learning Course 21

Bayesi Bayesian an Es Estim timato ators rs  Both ML and MAP return only single and specific values for the parameter Θ. Bayesian estimation, by contrast, calculates fully the posterior distribution Prob (Θ|X).  If: prior is well-behaved (i.e., does not assign 0 density to any “feasible” parameter value). Then: both MLE and Bayesian prediction converge to the same value as the number of training data increases. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 22

Any Q Any Questi uestion on End of Lecture 2 Thank you! Spring 2015 http://ce.sharif.edu/courses//93-94/2/ce717-1/ Sharif University of Technology, Computer Engineering Department, Machine Learning Course 23

Machine Learning Estimation Hamid R. Rabiee Spring 2015 - PowerPoint PPT Presentation

Machine Learning Estimation Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses//93-94/2/ce717-1/ Agenda Agenda Introduction Maximum Likelihood Estimation Maximum A Posteriori Estimation Bayesian Estimators Sharif

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Recap: N -gram models ANLP Lecture 6 We can model sentence probs by conditioning each word on

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Latent class analysis and finite mixture models with Stata Isabel Canette Principal

Econ 2148, fall 2019 Text as data Maximilian Kasy Department of Economics, Harvard University 1

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Modernmachinelearningmethods fortrustworthyscience TomCharnock

Mobility Inequality in the United States 1 download slides at: www.inequality.com/slides

Supporting Mobility in MobilityFirst F. Zhang, K. Nagaraja, T. Nguyen, D. Raychaudhuri, Y. Zhang

Machine Learning Estimation Hamid R. Rabiee Spring 2015 - PowerPoint PPT Presentation

Machine Learning Estimation Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses//93-94/2/ce717-1/ Agenda Agenda Introduction Maximum Likelihood Estimation Maximum A Posteriori Estimation Bayesian Estimators Sharif

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Recap: N -gram models ANLP Lecture 6 We can model sentence probs by conditioning each word on

Gaussian Mixture Models &amp; EM CE-717: Machine Learning Sharif University of Technology M.

Latent class analysis and finite mixture models with Stata Isabel Canette Principal

Econ 2148, fall 2019 Text as data Maximilian Kasy Department of Economics, Harvard University 1

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Modernmachinelearningmethods fortrustworthyscience TomCharnock

Mobility Inequality in the United States 1 download slides at: www.inequality.com/slides

Supporting Mobility in MobilityFirst F. Zhang, K. Nagaraja, T. Nguyen, D. Raychaudhuri, Y. Zhang

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.