Maximum Maximum Likelihood Estimation Daphne Koller Biased Coin - PowerPoint PPT Presentation

Learning Probabilistic Graphical Parameter Estimation Models Maximum Maximum Likelihood Estimation Daphne Koller

Biased Coin Example P is a Bernoulli distribution: P(X=1) = θ , P(X=0) = 1- θ sampled IID from P sampled IID from P • Tosses are independent of each other • Tosses are sampled from the same distribution (identically distributed) Daphne Koller

IID as a PGM θ θ X . . . X[1] X[M] Data m ⎧ θ = 1 [ ] x m x θ = ⎨ ( [ ] | ) P x m − θ = 0 ⎩ 1 [ ] x m x Daphne Koller

Maximum Likelihood Estimation • Goal: find θ∈ [0,1] that predicts D well • Prediction quality = likelihood of D given θ ∏ ∏ M θ θ = θ θ = θ θ ( ( : : ) ) ( ( | | ) ) ( ( [ [ ] ] | | ) ) L L D D P P D D P P x x m m = m 1 ( ) θ : , , , , L H T T H H L ( D: θ ) 1 θ 0 0.2 0.4 0.6 0.8 Daphne Koller

Maximum Likelihood Estimator • Observations: M H heads and M T tails • Find θ maximizing likelihood θ = θ − θ M M ( : , ) ( 1 ) L M M H T H T • Equivalent to maximizing log-likelihood θ = θ + − θ ( : , ) log log( 1 ) l M M M M H T H T • Differentiating the log-likelihood and solving for θ : M θ ˆ = H + M M H T Daphne Koller

Sufficient Statistics • For computing θ in the coin toss example, we only needed M H and M T since θ = θ − θ M M ( : ) ( 1 ) L D H T • � M H and M T are sufficient statistics Daphne Koller

Sufficient Statistics • A function s(D) is a sufficient statistic from instances to a vector in ℜ k if for any two datasets D and D’ and any θ∈Θ we have = ∑ = ∑ ∑ ∑ ⇒ ⇒ θ θ = = θ θ ( ( [ [ ]) ]) ( ( [ [ ]) ]) ( ( : : ) ) ( ( : : ' ) ) s s x x i i s s x x i i L L D D L L D D ∈ ∈ x [ i ] D x [ i ] D ' Datasets Statistics Daphne Koller

Sufficient Statistic for Multinomial • For a dataset D over variable X with k values, the sufficient statistics are counts <M 1 ,...,M k > where M i is the # of times that X[m]=x i in D • Sufficient statistic s(x) is a tuple of dimension k – s(x i )=(0,...0,1,0,...,0) ∏ k θ = θ M ( : ) L D i i = i 1 i Daphne Koller

Sufficient Statistic for Gaussian • Gaussian distribution: 2 − μ ⎛ ⎞ 1 x − 1 ⎜ ⎟ μ σ = σ ⎝ ⎠ 2 2 ( ) ~ ( , ) ( ) P X N if p X e π σ 2 • Rewrite as Rewrite as ⎛ ⎞ 2 σ 2 + x μ σ 2 − μ 2 1 1 p ( X ) = 2 πσ exp − x 2 ⎜ ⎟ 2 σ 2 ⎝ ⎠ • Sufficient statistics for Gaussian: s(x)=<1,x,x 2 > Daphne Koller

Maximum Likelihood Estimation • MLE Principle: Choose θ to maximize L(D: Θ ) • Multinomial MLE: • Multinomial MLE: M θ θ ˆ = = i i ∑ m M = i 1 i • Gaussian MLE: ) 1 ∑ μ = [ ] x m M m 1 ∑ σ = − μ 2 ˆ ˆ ( [ ] ) x m M m Daphne Koller

Summary • Maximum likelihood estimation is a simple principle for parameter selection given D • Likelihood function uniquely determined Lik lih d functi n uniqu l d t min d by sufficient statistics that summarize D • MLE has closed form solution for many parametric distributions Daphne Koller

Maximum Maximum Likelihood Estimation Daphne Koller Biased Coin - PowerPoint PPT Presentation

Learning Probabilistic Graphical Parameter Estimation Models Maximum Maximum Likelihood Estimation Daphne Koller Biased Coin Example P is a Bernoulli distribution: P(X=1) = , P(X=0) = 1- sampled IID from P sampled IID from P

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

Phylogenetic trees IV Maximum Likelihood Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Quasi-maximum likelihood estimation for multivariate CARMA processes Eckhard Schlemm Institute

Week 2: Maximum Likelihood Estimation Instructor: Sergey Levine 1 Recap: MLE for the binomial

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Estimation II: Sufficiency Stat 3202 @ OSU, Autumn 2018 Dalpiaz 1 The Main Idea Suppose we have

Multilingual and cross-lingual news topic tracking asper a Emilia K Koke, February 05, 2005 a

Nearly Optimal Sparse Fourier Transform Haitham Hassanieh Piotr Indyk Dina Katabi Eric Price

Applied Machine Learning Applied Machine Learning Bootstrap, Bagging and Boosting Siamak

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

Structure of optimal strategies for remote estimation over Gilbert-Elliott channel with feedback

Using panelstat to compute statistics for panel data Marta Silva (Banco de Portugal) 4th Stata

Patterns of Evolution Summary statistics based on segregating sites Site Frequency Spectrum 3 2