Learning Probabilistic Graphical Parameter Estimation Models Maximum Maximum Likelihood Estimation Daphne Koller
Biased Coin Example P is a Bernoulli distribution: P(X=1) = θ , P(X=0) = 1- θ sampled IID from P sampled IID from P • Tosses are independent of each other • Tosses are sampled from the same distribution (identically distributed) Daphne Koller
IID as a PGM θ θ X . . . X[1] X[M] Data m ⎧ θ = 1 [ ] x m x θ = ⎨ ( [ ] | ) P x m − θ = 0 ⎩ 1 [ ] x m x Daphne Koller
Maximum Likelihood Estimation • Goal: find θ∈ [0,1] that predicts D well • Prediction quality = likelihood of D given θ ∏ ∏ M θ θ = θ θ = θ θ ( ( : : ) ) ( ( | | ) ) ( ( [ [ ] ] | | ) ) L L D D P P D D P P x x m m = m 1 ( ) θ : , , , , L H T T H H L ( D: θ ) 1 θ 0 0.2 0.4 0.6 0.8 Daphne Koller
Maximum Likelihood Estimator • Observations: M H heads and M T tails • Find θ maximizing likelihood θ = θ − θ M M ( : , ) ( 1 ) L M M H T H T • Equivalent to maximizing log-likelihood θ = θ + − θ ( : , ) log log( 1 ) l M M M M H T H T • Differentiating the log-likelihood and solving for θ : M θ ˆ = H + M M H T Daphne Koller
Sufficient Statistics • For computing θ in the coin toss example, we only needed M H and M T since θ = θ − θ M M ( : ) ( 1 ) L D H T • � M H and M T are sufficient statistics Daphne Koller
Sufficient Statistics • A function s(D) is a sufficient statistic from instances to a vector in ℜ k if for any two datasets D and D’ and any θ∈Θ we have = ∑ = ∑ ∑ ∑ ⇒ ⇒ θ θ = = θ θ ( ( [ [ ]) ]) ( ( [ [ ]) ]) ( ( : : ) ) ( ( : : ' ) ) s s x x i i s s x x i i L L D D L L D D ∈ ∈ x [ i ] D x [ i ] D ' Datasets Statistics Daphne Koller
Sufficient Statistic for Multinomial • For a dataset D over variable X with k values, the sufficient statistics are counts <M 1 ,...,M k > where M i is the # of times that X[m]=x i in D • Sufficient statistic s(x) is a tuple of dimension k – s(x i )=(0,...0,1,0,...,0) ∏ k θ = θ M ( : ) L D i i = i 1 i Daphne Koller
Sufficient Statistic for Gaussian • Gaussian distribution: 2 − μ ⎛ ⎞ 1 x − 1 ⎜ ⎟ μ σ = σ ⎝ ⎠ 2 2 ( ) ~ ( , ) ( ) P X N if p X e π σ 2 • Rewrite as Rewrite as ⎛ ⎞ 2 σ 2 + x μ σ 2 − μ 2 1 1 p ( X ) = 2 πσ exp − x 2 ⎜ ⎟ 2 σ 2 ⎝ ⎠ • Sufficient statistics for Gaussian: s(x)=<1,x,x 2 > Daphne Koller
Maximum Likelihood Estimation • MLE Principle: Choose θ to maximize L(D: Θ ) • Multinomial MLE: • Multinomial MLE: M θ θ ˆ = = i i ∑ m M = i 1 i • Gaussian MLE: ) 1 ∑ μ = [ ] x m M m 1 ∑ σ = − μ 2 ˆ ˆ ( [ ] ) x m M m Daphne Koller
Summary • Maximum likelihood estimation is a simple principle for parameter selection given D • Likelihood function uniquely determined Lik lih d functi n uniqu l d t min d by sufficient statistics that summarize D • MLE has closed form solution for many parametric distributions Daphne Koller
Recommend
More recommend