Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Steven J Zeil Old Dominion Univ. Fall 2010 1
Distributions Estimating Distribution Parameters Parametric Classification Regression Outline Distributions 1 Estimating Distribution Parameters 2 Maximum Likelihood Estimation Evaluating an Estimator: Bias and Variance Bayes’ Estimator Parametric Classification 3 Regression 4 Regression Error Linear Regression Polynomial Regression Tuning Model Complexity Model Selection 2
Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Assume that sample is drawn from a known distribution 3
Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Assume that sample is drawn from a known distribution Advantage: model can be formed from a small number of parameters 3
Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Assume that sample is drawn from a known distribution Advantage: model can be formed from a small number of parameters e.g., mean & variance 3
Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Assume that sample is drawn from a known distribution Advantage: model can be formed from a small number of parameters e.g., mean & variance Estimate parameters from the sample to get an estimated distribution 3
Distributions Estimating Distribution Parameters Parametric Classification Regression Parametric Methods Assume that sample is drawn from a known distribution Advantage: model can be formed from a small number of parameters e.g., mean & variance Estimate parameters from the sample to get an estimated distribution Then use that distribution to make decisions 3
Distributions Estimating Distribution Parameters Parametric Classification Regression Distributions Classification discriminant function g i ( x ) = P ( C i ) p ( x | C i ) 4
Distributions Estimating Distribution Parameters Parametric Classification Regression Distributions Classification discriminant function g i ( x ) = P ( C i ) p ( x | C i ) For classification, we need to estimate the densities p ( x | C i ) and P ( C i ) 4
Distributions Estimating Distribution Parameters Parametric Classification Regression Distributions Classification discriminant function g i ( x ) = P ( C i ) p ( x | C i ) For classification, we need to estimate the densities p ( x | C i ) and P ( C i ) For regression, we need to estimate p ( y | x ) 4
Distributions Estimating Distribution Parameters Parametric Classification Regression Distributions Classification discriminant function g i ( x ) = P ( C i ) p ( x | C i ) For classification, we need to estimate the densities p ( x | C i ) and P ( C i ) For regression, we need to estimate p ( y | x ) In this chapter, we use single variables ( � x = [ x ]) 4
Distributions Estimating Distribution Parameters Parametric Classification Regression Example Distributions Bernoulli: x ∈ { 0 , 1 } P ( x ) = p x 0 (1 − p 0 ) (1 − x ) Multinomial: K > 2 states, x i ∈ { 0 , 1 } � p x i P ( x 1 , x 2 , . . . , x k ) = i i 5
Distributions Estimating Distribution Parameters Parametric Classification Regression Example Distributions Gaussian (Normal): − ( x − µ ) 2 1 � � √ p ( x ) = exp 2 σ 2 2 πσ 6
Distributions Estimating Distribution Parameters Parametric Classification Regression Outline Distributions 1 Estimating Distribution Parameters 2 Maximum Likelihood Estimation Evaluating an Estimator: Bias and Variance Bayes’ Estimator Parametric Classification 3 Regression 4 Regression Error Linear Regression Polynomial Regression Tuning Model Complexity Model Selection 7
Distributions Estimating Distribution Parameters Parametric Classification Regression Likelihood iid sample X = { x t } t , drawn from p ( x | θ ) 8
Distributions Estimating Distribution Parameters Parametric Classification Regression Likelihood iid sample X = { x t } t , drawn from p ( x | θ ) How to find θ that makes our sample as likely as possible? 8
Distributions Estimating Distribution Parameters Parametric Classification Regression Likelihood iid sample X = { x t } t , drawn from p ( x | θ ) How to find θ that makes our sample as likely as possible? Because the x t are indep, the likelihood of θ given X is N � p ( x t | θ ) l ( θ |X ) ≡ p ( X| θ ) = t =1 8
Distributions Estimating Distribution Parameters Parametric Classification Regression Maximum Likelihood Estimation (MLE) Likelihood N � p ( x t | θ ) l ( θ |X ) ≡ p ( X| θ ) = t =1 In MLE, find the θ that makes X the most likely to be seen Search for θ that maximizes l ( θ |X ) To simplify, we often instead maximize the log likelihood : N � log p ( x t | θ ) L ( θ |X ) ≡ log l ( θ |X ) = t =1 Maximum Likelihood Estimator θ ∗ = argmax θ L ( θ |X ) 9
Distributions Estimating Distribution Parameters Parametric Classification Regression Example MLEs Bernoulli: x ∈ { 0 , 1 } P ( x ) = p x 0 (1 − p 0 ) (1 − x ) � p x t 0 (1 − p 0 ) (1 − x t ) L ( p 0 |X ) = log t t x t � MLE : p 0 = N Multinomial: K > 2 states, x i ∈ { 0 , 1 } � p x i P ( x 1 , x 2 , . . . , x k ) = i i x t � � L ( p 1 , p 2 , . . . , p k |X ) = log p i i t i t x t � i MLE : p i = N 10
Distributions Estimating Distribution Parameters Parametric Classification Regression Example MLEs Gaussian (Normal): − ( x − µ ) 2 1 � � p ( x ) = N ( µ, σ 2 ) = √ exp 2 σ 2 2 πσ t x t � MLE for µ : m = N t ( x t − m ) 2 � MLE for σ 2 : s 2 = N 11
Distributions Estimating Distribution Parameters Parametric Classification Regression Bias and Variance Population X drawn from p ( x | θ ) Estimator of θ , d i = d ( X i ) on sample X i Bias: b θ ( d ) = E [ d ] − θ � ( d − E [ d ]) 2 � Variance: E Mean square error: ( d − θ ) 2 � � r ( d , θ ) = E ( E [ d ] − θ ) 2 + E � ( d − E [ d ]) 2 � = Bias 2 + Variance = 12
Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) 13
Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) Bayes’ rule: p ( X| θ ) p ( θ ) p ( θ |X ) = p ( X ) p ( X| θ ) p ( θ ) = � p ( X| θ ′ ) p ( θ ′ ) d θ ′ 13
Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) Bayes’ rule: p ( X| θ ) p ( θ ) p ( θ |X ) = p ( X ) p ( X| θ ) p ( θ ) = � p ( X| θ ′ ) p ( θ ′ ) d θ ′ Problem: except in special cases, this won’t have a nice, closed-form solution 13
Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) Bayes’ rule: p ( X| θ ) p ( θ ) p ( θ |X ) = p ( X ) p ( X| θ ) p ( θ ) = � p ( X| θ ′ ) p ( θ ′ ) d θ ′ Problem: except in special cases, this won’t have a nice, closed-form solution Numerical estimation 13
Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) Bayes’ rule: p ( X| θ ) p ( θ ) p ( θ |X ) = p ( X ) p ( X| θ ) p ( θ ) = � p ( X| θ ′ ) p ( θ ′ ) d θ ′ Problem: except in special cases, this won’t have a nice, closed-form solution Numerical estimation Use simpler “point” estimators 13
Distributions Estimating Distribution Parameters Parametric Classification Regression Estimators If we have prior knowledge of p ( θ ) Bayes’ rule: p ( X| θ ) p ( θ ) p ( θ |X ) = p ( X ) p ( X| θ ) p ( θ ) = � p ( X| θ ′ ) p ( θ ′ ) d θ ′ Problem: except in special cases, this won’t have a nice, closed-form solution Numerical estimation Use simpler “point” estimators If form is tractable, we can do Bayes’ estimator 13
Distributions Estimating Distribution Parameters Parametric Classification Regression Simpler Estimators Maximum a Posteriori (MAP) θ MAP = argmax θ p ( θ |X ) 14
Distributions Estimating Distribution Parameters Parametric Classification Regression Simpler Estimators Maximum a Posteriori (MAP) θ MAP = argmax θ p ( θ |X ) Maximum Likelihood (ML) θ ML = argmax θ p ( X| θ ) 14
Distributions Estimating Distribution Parameters Parametric Classification Regression Bayes’ Estimator Bayes: � θ Bayes = E [ θ |X ] = θ p ( θ |X ) d θ Example: x t ∼ N ( θ, σ 2 0 ) and θ ∼ N ( µ, σ 2 ) Let m be mean of the sample By the Central limit theorem, the distribution of even a non-normal poupulation’s mean is approx. normal, centered on the population mean, with a standard dev. of σ √ N θ ML = m θ MAP = θ Bayes = N /σ 2 1 /σ 2 0 E [ θ |X ] = 0 + 1 /σ 2 m + 0 + 1 /σ 2 µ N /σ 2 N /σ 2 15
Recommend
More recommend