Directed Probabilistic Graphical Models CMSC 678 UMBC
Announcement 1: Assignment 3 Due Wednesday April 11 th , 11:59 AM Any questions?
Announcement 2: Progress Report on Project Due Monday April 16 th , 11:59 AM Build on the proposal: Update to address comments Discuss the progress you’ve made Discuss what remains to be done Discuss any new blocks you’ve experienced (or anticipate experiencing) Any questions?
Outline Recap of EM Math: Lagrange Multipliers for constrained optimization Probabilistic Modeling Example: Die Rolling Directed Graphical Models Naïve Bayes Hidden Markov Models Message Passing: Directed Graphical Model Inference Most likely sequence Total (marginal) probability EM in D-PGMs
Recap from last time…
Expectation Maximization (EM): E-step 0. Assume some value for your parameters Two step, iterative algorithm 1. E-step: count under uncertainty, assuming these parameters 𝑞(𝑨 𝑗 ) count(𝑨 𝑗 , 𝑥 𝑗 ) 2. M-step: maximize log-likelihood, assuming these uncertain counts 𝑞 𝑢+1 (𝑨) 𝑞 (𝑢) (𝑨) estimated counts http://blog.innotas.com/wp-
EM Math E-step: count under uncertainty max 𝔽 𝑨 ~ 𝑞 𝜄(𝑢) (⋅|𝑥) log 𝑞 𝜄 (𝑨, 𝑥) old parameters 𝜄 new parameters posterior distribution new parameters M-step: maximize log-likelihood 𝒟 𝜄 = log-likelihood of 𝒬 𝜄 = posterior log- ℳ 𝜄 = marginal log- complete data (X,Y) likelihood of incomplete data Y likelihood of observed data X ℳ 𝜄 = 𝔽 𝑍∼𝜄 (𝑢) [𝒟 𝜄 |𝑌] − 𝔽 𝑍∼𝜄 (𝑢) [𝒬 𝜄 |𝑌] EM does not decrease the marginal log-likelihood
Outline Recap of EM Math: Lagrange Multipliers for constrained optimization Probabilistic Modeling Example: Die Rolling Directed Graphical Models Naïve Bayes Hidden Markov Models Message Passing: Directed Graphical Model Inference Most likely sequence Total (marginal) probability EM in D-PGMs
Lagrange multipliers Assume an original optimization problem
Lagrange multipliers Assume an original optimization problem We convert it to a new optimization problem:
Lagrange multipliers: an equivalent problem?
Lagrange multipliers: an equivalent problem?
Lagrange multipliers: an equivalent problem?
Outline Recap of EM Math: Lagrange Multipliers for constrained optimization Probabilistic Modeling Example: Die Rolling Directed Graphical Models Naïve Bayes Hidden Markov Models Message Passing: Directed Graphical Model Inference Most likely sequence Total (marginal) probability EM in D-PGMs
Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Generative Story 𝑥 1 = 1 for roll 𝑗 = 1 to 𝑂: 𝑥 𝑗 ∼ Cat(𝜄) 𝑥 2 = 5 𝑥 3 = 4 a probability distribution over 6 sides of the die ⋯ 6 0 ≤ 𝜄 𝑙 ≤ 1, ∀𝑙 𝜄 𝑙 = 1 𝑙=1
Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Generative Story 𝑥 1 = 1 for roll 𝑗 = 1 to 𝑂: 𝑥 𝑗 ∼ Cat(𝜄) 𝑥 2 = 5 Maximize Log-likelihood 𝑥 3 = 4 ℒ 𝜄 = log 𝑞 𝜄 (𝑥 𝑗 ) 𝑗 ⋯ = log 𝜄 𝑥 𝑗 𝑗
Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Generative Story Maximize Log-likelihood for roll 𝑗 = 1 to 𝑂: ℒ 𝜄 = log 𝜄 𝑥 𝑗 𝑥 𝑗 ∼ Cat(𝜄) 𝑗 Q: What’s an easy way to maximize this, as written exactly (even without calculus)?
Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Generative Story Maximize Log-likelihood for roll 𝑗 = 1 to 𝑂: ℒ 𝜄 = log 𝜄 𝑥 𝑗 𝑥 𝑗 ∼ Cat(𝜄) 𝑗 Q: What’s an easy way to maximize this, as written exactly (even without calculus)? A: Just keep increasing 𝜄 𝑙 ( we know 𝜄 must be a distribution, but it’s not specified)
Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Maximize Log-likelihood (with distribution constraints) 6 (we can include the inequality constraints ℒ 𝜄 = log 𝜄 𝑥 𝑗 s. t. 𝜄 𝑙 = 1 0 ≤ 𝜄 𝑙 , but it complicates the problem and, right 𝑗 𝑙=1 now , is not needed) solve using Lagrange multipliers
Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Maximize Log-likelihood (with distribution constraints) (we can include the 6 inequality constraints 0 ≤ 𝜄 𝑙 , but it ℱ 𝜄 = log 𝜄 𝑥 𝑗 − 𝜇 𝜄 𝑙 − 1 complicates the problem and, right 𝑗 𝑙=1 now , is not needed) 6 𝜖ℱ 𝜄 1 𝜖ℱ 𝜄 = − 𝜇 = − 𝜄 𝑙 + 1 𝜖𝜄 𝑙 𝜄 𝑥 𝑗 𝜖𝜇 𝑗:𝑥 𝑗 =𝑙 𝑙=1
Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Maximize Log-likelihood (with distribution constraints) (we can include the 6 inequality constraints 0 ≤ 𝜄 𝑙 , but it ℱ 𝜄 = log 𝜄 𝑥 𝑗 − 𝜇 𝜄 𝑙 − 1 complicates the problem and, right 𝑗 𝑙=1 now , is not needed) 6 σ 𝑗:𝑥 𝑗 =𝑙 1 𝜄 𝑙 = optimal 𝜇 when 𝜄 𝑙 = 1 𝜇 𝑙=1
Probabilistic Estimation of Rolling a Die N different (independent) rolls 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 Maximize Log-likelihood (with distribution constraints) (we can include the 6 inequality constraints 0 ≤ 𝜄 𝑙 , but it ℱ 𝜄 = log 𝜄 𝑥 𝑗 − 𝜇 𝜄 𝑙 − 1 complicates the problem and, right 𝑗 𝑙=1 now , is not needed) 6 σ 𝑗:𝑥 𝑗 =𝑙 1 σ 𝑙 σ 𝑗:𝑥 𝑗 =𝑙 1 = 𝑂 𝑙 𝜄 𝑙 = optimal 𝜇 when 𝜄 𝑙 = 1 𝑂 𝑙=1
Example: Conditionally Rolling a Die 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 add complexity to better explain what we see 𝑞 𝑨 1 , 𝑥 1 , 𝑨 2 , 𝑥 2 , … , 𝑨 𝑂 , 𝑥 𝑂 = 𝑞 𝑨 1 𝑞 𝑥 1 |𝑨 1 ⋯ 𝑞 𝑨 𝑂 𝑞 𝑥 𝑂 |𝑨 𝑂 = ෑ 𝑞 𝑥 𝑗 |𝑨 𝑗 𝑞 𝑨 𝑗 𝑗 𝑞 heads = 𝜇 𝑞 tails = 1 − 𝜇 𝑨 1 = 𝐼 𝑥 1 = 1 𝑞 heads = 𝛿 𝑨 2 = 𝑈 𝑥 2 = 5 𝑞 tails = 1 − 𝛿 ⋯ 𝑞 heads = 𝜔 𝑞 tails = 1 − 𝜔
Example: Conditionally Rolling a Die 𝑞 𝑥 1 , 𝑥 2 , … , 𝑥 𝑂 = 𝑞 𝑥 1 𝑞 𝑥 2 ⋯ 𝑞 𝑥 𝑂 = ෑ 𝑞 𝑥 𝑗 𝑗 add complexity to better explain what we see 𝑞 𝑨 1 , 𝑥 1 , 𝑨 2 , 𝑥 2 , … , 𝑨 𝑂 , 𝑥 𝑂 = 𝑞 𝑨 1 𝑞 𝑥 1 |𝑨 1 ⋯ 𝑞 𝑨 𝑂 𝑞 𝑥 𝑂 |𝑨 𝑂 = ෑ 𝑞 𝑥 𝑗 |𝑨 𝑗 𝑞 𝑨 𝑗 𝑗 Generative Story 𝑞 heads = 𝜇 𝜇 = distribution over penny 𝑞 tails = 1 − 𝜇 𝛿 = distribution for dollar coin 𝜔 = distribution over dime 𝑞 heads = 𝛿 for item 𝑗 = 1 to 𝑂: 𝑞 tails = 1 − 𝛿 𝑨 𝑗 ~ Bernoulli 𝜇 𝑞 heads = 𝜔 if 𝑨 𝑗 = 𝐼: 𝑥 𝑗 ~ Bernoulli 𝛿 else: 𝑥 𝑗 ~ Bernoulli 𝜔 𝑞 tails = 1 − 𝜔
Outline Recap of EM Math: Lagrange Multipliers for constrained optimization Probabilistic Modeling Example: Die Rolling Directed Graphical Models Naïve Bayes Hidden Markov Models Message Passing: Directed Graphical Model Inference Most likely sequence Total (marginal) probability EM in D-PGMs
Classify with Bayes Rule argmax 𝑍 𝑞 𝑍 𝑌) argmax 𝑍 log 𝑞 𝑌 𝑍) + log 𝑞(𝑍) likelihood prior
The Bag of Words Representation Adapted from Jurafsky & Martin (draft)
The Bag of Words Representation Adapted from Jurafsky & Martin (draft)
The Bag of Words Representation 29 Adapted from Jurafsky & Martin (draft)
Bag of Words Representation seen 2 classifier sweet 1 γ ( )=c whimsical 1 recommend 1 happy 1 classifier ... ... Adapted from Jurafsky & Martin (draft)
Naïve Bayes: A Generative Story Generative Story 𝜚 = distribution over 𝐿 labels global for label 𝑙 = 1 to 𝐿: parameters 𝜄 𝑙 = generate parameters
Naïve Bayes: A Generative Story Generative Story 𝜚 = distribution over 𝐿 labels y for label 𝑙 = 1 to 𝐿: 𝜄 𝑙 = generate parameters for item 𝑗 = 1 to 𝑂: 𝑧 𝑗 ~ Cat 𝜚
Naïve Bayes: A Generative Story Generative Story 𝜚 = distribution over 𝐿 labels y for label 𝑙 = 1 to 𝐿: 𝜄 𝑙 = generate parameters for item 𝑗 = 1 to 𝑂: 𝑦 𝑗1 𝑦 𝑗2 𝑦 𝑗3 𝑦 𝑗4 𝑦 𝑗5 𝑧 𝑗 ~ Cat 𝜚 local variables for each feature 𝑘 𝑦 𝑗𝑘 ∼ F 𝑘 (𝜄 𝑧 𝑗 )
Recommend
More recommend