10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Inference for Parameter Estimation + Topic Modeling Matt Gormley Lecture 20 Nov. 4, 2019 1
Reminders • Homework 3: Structured SVM – Out: Fri, Oct. 24 – Due: Wed, Nov. 6 at 11:59pm • Homework 4: Topic Modeling – Out: Wed, Nov. 6 – Due: Mon, Nov. 18 at 11:59pm 3
TOPIC MODELING 4
Topic Modeling Motivation: Suppose you’re given a massive corpora and asked to carry out the following tasks • Organize the documents into thematic categories • Describe the evolution of those categories over time • Enable a domain expert to analyze and understand the content • Find relationships between the categories • Understand how authorship influences the content
Topic Modeling Motivation: Suppose you’re given a massive corpora and asked to carry out the following tasks • Organize the documents into thematic categories • Describe the evolution of those categories over time • Enable a domain expert to analyze and understand the content • Find relationships between the categories • Understand how authorship influences the content Topic Modeling: A method of (usually unsupervised) discovery of latent or hidden structure in a corpus • Applied primarily to text corpora, but techniques are more general • Provides a modeling toolbox • Has prompted the exploration of a variety of new inference methods to accommodate large-scale datasets
Topic Modeling Dirichlet-multinomial regression (DMR) topic model on ICML (Mimno & McCallum, 2008) http:// www.cs.umass.edu/~mimno/icml100.html
Topic Modeling • Map of NIH Grants (Talley et al., 2011) https://app.nihmaps.org/
Other Applications of Topic Models • Spacial LDA (Wang & Grimson, 2007) Manual LDA SLDA
Outline • Applications of Topic Modeling • Latent Dirichlet Allocation (LDA) 1. Beta-Bernoulli 2. Dirichlet-Multinomial 3. Dirichlet-Multinomial Mixture Model 4. LDA • Bayesian Inference for Parameter Estimation – Exact inference – EM – Monte Carlo EM – Gibbs sampler – Collapsed Gibbs sampler • Extensions of LDA – Correlated topic models – Dynamic topic models – Polylingual topic models – Supervised LDA
BAYESIAN INFERENCE FOR NAÏVE BAYES 12
Beta-Bernoulli Model • Beta Distribution 1 B ( α , β ) x α − 1 (1 − x ) β − 1 f ( φ | α , β ) = 4 3 α = 0 . 1 , β = 0 . 9 f ( φ | α , β ) α = 0 . 5 , β = 0 . 5 2 α = 1 . 0 , β = 1 . 0 α = 5 . 0 , β = 5 . 0 α = 10 . 0 , β = 5 . 0 1 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 φ
Beta-Bernoulli Model • Generative Process ⇤ ∼ Beta ( � , ⇥ ) [ draw distribution over words ] For each word n ∈ { 1 , . . . , N } x n ∼ Bernoulli ( ⇤ ) [ draw word ] • Example corpus (heads/tails) H T T H H T T H H H x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10
Dirichlet-Multinomial Model • Dirichlet Distribution 1 B ( α , β ) x α − 1 (1 − x ) β − 1 f ( φ | α , β ) = 4 3 α = 0 . 1 , β = 0 . 9 f ( φ | α , β ) α = 0 . 5 , β = 0 . 5 2 α = 1 . 0 , β = 1 . 0 α = 5 . 0 , β = 5 . 0 α = 10 . 0 , β = 5 . 0 1 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 φ
Dirichlet-Multinomial Model • Dirichlet Distribution K ⇥ K k =1 Γ ( � k ) 1 p ( ⌅ ⇤ α k − 1 ⇤ ⇤ | α ) = where B ( α ) = k Γ ( � K B ( α ) k =1 � k ) k =1 15 3 10 p ( ~ � | ~ ↵ ) 2 . 5 p 5 ( � ~ | ~ ↵ ) 2 0 0 1 . 5 0 0 . 25 0 . 25 1 0 . 5 � 1 0 . 8 0 . 5 1 � 0 . 8 1 0 . 6 0 . 75 0 . 6 0 . 75 0 . 4 0 . 4 � 2 � 2 0 . 2 0 . 2 1 1 0 0
Dirichlet-Multinomial Model • Generative Process φ ∼ Dir ( β ) [ draw distribution over words ] For each word n ∈ { 1 , . . . , N } x n ∼ Mult (1 , φ ) [ draw word ] • Example corpus the he is the and the she she is is x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10
Dirichlet-Multinomial Mixture Model • Generative Process !"#$%& ,)$-!(.*/ '"%()*+!& • Example corpus the he is the and the she she is is x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 x 34 Document 1 Document 2 Document 3
Dirichlet-Multinomial Mixture Model • Generative Process For each topic k ∈ { 1 , . . . , K } : φ k ∼ Dir ( β ) [ draw distribution over words ] θ ∼ Dir ( α ) [ draw distribution over topics ] For each document m ∈ { 1 , . . . , M } z m ∼ Mult (1 , θ ) [ draw topic assignment ] For each word n ∈ { 1 , . . . , N m } x mn ∼ Mult (1 , φ z m ) [ draw word ] • Example corpus the he is the and the she she is is x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 x 34 Document 1 Document 2 Document 3
Bayesian Inference for Naïve Bayes Whiteboard : – Naïve Bayes is not Bayesian – What if we observed both words and topics? – Dirichlet-Multinomial in the fully observed setting is just Naïve Bayes – Three ways of estimating parameters: 1. MLE for Naïve Bayes 2. MAP estimation for Naïve Bayes 3. Bayesian parameter estimation for Naïve Bayes 20
Dirichlet-Multinomial Model • The Dirichlet is conjugate to the Multinomial φ ∼ Dir ( β ) [ draw distribution over words ] For each word n ∈ { 1 , . . . , N } x n ∼ Mult (1 , φ ) [ draw word ] • The posterior of ⇤ is p ( ⇤ | X ) = p ( X | φ ) p ( φ ) P ( X ) • Define the count vector n such that n t denotes the number of times word t appeared • Then the posterior is also a Dirichlet distribution: p ( ⇤ | X ) ∼ Dir ( β + n )
LATENT DIRICHLET ALLOCATION (LDA) 22
Mixture vs. Admixture (LDA) !"#$%& ,)$-!(.*/ '"%()*+!& !"#$%& ,0')$-!(.*/ '"%()*+!& Diagrams from Wallach, JHU 2011, slides
Latent Dirichlet Allocation • Generative Process !"#$%& ,0')$-!(.*/ '"%()*+!& • Example corpus the he is the and the she she is is x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 x 34 Document 1 Document 2 Document 3
Latent Dirichlet Allocation • Generative Process For each topic k ∈ { 1 , . . . , K } : φ k ∼ Dir ( β ) [ draw distribution over words ] For each document m ∈ { 1 , . . . , M } θ m ∼ Dir ( α ) [ draw distribution over topics ] For each word n ∈ { 1 , . . . , N m } z mn ∼ Mult (1 , θ m ) [ draw topic assignment ] x mn ∼ φ z mi [ draw word ] • Example corpus the he is the and the she she is is x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 x 34 Document 1 Document 2 Document 3
(Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) 0.012 0.012 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 0.000 0.000 0.000 0.000 • The generative story begins with only a Dirichlet prior over the topics. • Each topic is defined as a Multinomial distribution over the vocabulary, parameterized by ϕ k 26
(Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) ϕ 3 ϕ 1 ϕ 2 ϕ 4 ϕ 5 ϕ 6 0.012 0.012 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 0.000 0.000 0.000 0.000 • The generative story begins with only a Dirichlet prior over the topics. • Each topic is defined as a Multinomial distribution over the vocabulary, parameterized by ϕ k 27
(Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) 0.012 0.012 ϕ 3 ϕ 1 ϕ 2 ϕ 4 ϕ 5 ϕ 6 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 0.000 0.000 0.000 0.000 team, season, hockey, player, penguins, ice, canadiens, puck, montreal, stanley, cup • A topic is visualized as its high probability words . • A pedagogical label is used to identify the topic. 28
(Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) 0.012 0.012 ϕ 3 ϕ 1 ϕ 2 ϕ 4 ϕ 5 ϕ 6 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 {hockey} 0.000 0.000 0.000 0.000 team, season, hockey, player, penguins, ice, canadiens, puck, montreal, stanley, cup • A topic is visualized as its high probability words . • A pedagogical label is used to identify the topic. 29
(Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) 0.012 0.012 ϕ 3 ϕ 1 ϕ 2 ϕ 4 ϕ 5 ϕ 6 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 0.000 0.000 0.000 0.000 { Canadian gov. } {government} {hockey} {U.S. gov.} {baseball} {Japan} • A topic is visualized as its high probability words. • A pedagogical label is used to identify the topic. 30
(Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) 0.012 0.012 ϕ 3 ϕ 1 ϕ 2 ϕ 4 ϕ 5 ϕ 6 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 0.000 0.000 0.000 0.000 { Canadian gov. } {government} {hockey} {U.S. gov.} {baseball} {Japan} Dirichlet( α ) θ 1 = 31
(Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) 0.012 0.012 ϕ 3 ϕ 1 ϕ 2 ϕ 4 ϕ 5 ϕ 6 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 0.000 0.000 0.000 0.000 { Canadian gov. } {government} {hockey} {U.S. gov.} {baseball} {Japan} Dirichlet( α ) θ 1 = The 54/40' boundary dispute is still unresolved, and Canadian and US 32
(Blei, Ng, & Jordan, 2003) LDA for Topic Modeling Dirichlet( β ) 0.012 0.012 ϕ 3 ϕ 1 ϕ 2 ϕ 4 ϕ 5 ϕ 6 0.006 0.006 0.006 0.006 0.006 0.006 0.000 0.000 0.000 0.000 0.000 0.000 { Canadian gov. } {government} {hockey} {U.S. gov.} {baseball} {Japan} Dirichlet( α ) θ 1 = The 54/40' boundary dispute is still unresolved, and Canadian and US 33
Recommend
More recommend