Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 20: Topic modeling and variational inferrence Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1
Administrivia • Poster printing (stay tuned!) • HW 5 (final homework) is due next Friday! • Midpoint feedback 2
Learning Objectives • Learn about latent Dirichlet allocation • Understand the inituion behind variational inference 3
Topic models • Discrete count data 4
Topic models • Suppose you have a huge number of documents • Want to know what's going on • Can't read them all (e.g. every New York Times article from the 90's) • Topic models offer a way to get a corpus-level view of major themes • Unsupervised 5
Why should you care? • Neat way to explore/understand corpus collections • E-discovery • Social media • Scientific data • NLP Applications • Word sense disambiguation • Discourse segmentation • Psychology: word meaning, polysemy • A general way to model count data and a general inference algorithm 6
Conceptual approach • Input: a text corpus and number of topics K • Output: Corpus • K topics, each topic is a list of words • Topic assignment for each document Forget the Bootleg, Just Download the Movie Legally Multiplex Heralded As Linchpin To Growth The Shape of Cinema, Transformed At the Click of A Peaceful Crew Puts a Mouse Muppets Where Its Mouth Is Stock Trades: A Better Deal For Investors Isn't Simple The three big Internet portals begin to distinguish Red Light, Green Light: A among themselves as 2-Tone L.E.D. to shopping malls Simplify Screens 7
Conceptual approach • K topics, each topic is a list of words TOPIC 1 TOPIC 2 TOPIC 3 computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine 8
Conceptual approach • Topic assignment for each document Internet portals Red Light, Green Stock Trades: A begin to distinguish Light: A Better Deal For among themselves 2-Tone L.E.D. to Investors Isn't as shopping malls Simplify Screens Simple Forget the TOPIC 1 TOPIC 2 Bootleg, Just "TECHNOLOGY" "BUSINESS" Download the Movie Legally Multiplex Heralded The Shape of As Linchpin To Cinema, Growth Transformed At the Click of a A Peaceful Crew Mouse TOPIC 3 Puts Muppets "ENTERTAINMENT" Where Its Mouth Is 9
Topics from Science 10
Topic models • Discrete count data • Gaussian distributions are not appropriate 11
Generative model: Latent Dirichlet Allocation • Generate a document, or a bag of words • Blei, Ng, Jordan. Latent Dirichlet Allocation. JMLR, 2003. 12
Generative model: Latent Dirichlet Allocation • Generate a document, or a bag (1,0,0) (0,0,1) (0,1,0) of words • Multinomial distribution • Distribution over discrete outcomes • Represented by non-negative vector that sums to one (1/3,1/3,1/3) (1/4,1/4,1/2) (1/2,1/2,0) • Picture representation 13
Generative model: Latent Dirichlet Allocation • Generate a document, or a bag (1,0,0) (0,0,1) (0,1,0) of words • Multinomial distribution • Distribution over discrete outcomes • Represented by non-negative vector that sums to one (1/3,1/3,1/3) (1/4,1/4,1/2) (1/2,1/2,0) • Picture representation • Come from a Dirichlet distribution 14
Generative story computer, TOPIC 1 technology, system, service, site, phone, internet, machine TOPIC 2 sell, sale, store, product, business, advertising, market, consumer TOPIC 3 play, film, movie, theater, production, star, director, stage 15
Generative story The three big Internet portals begin to distinguish among themselves as shopping malls Red Light, Green Light: A Stock Trades: A Better Deal 2-Tone L.E.D. to For Investors Isn't Simple Simplify Screens TOPIC 1 TOPIC 2 Forget the Bootleg, Just Download the Movie Legally The Shape of Cinema, Multiplex Heralded As Transformed At the Click of Linchpin To Growth a Mouse A Peaceful Crew Puts Muppets Where Its Mouth Is TOPIC 3 16
Generative story computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ... 17
Generative story computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ... 18
Generative story computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ... 19
Generative story computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ... 20
Missing component: how to generate a multinomial distribution 21
Missing component: how to generate a multinomial distribution 22
Missing component: how to generate a multinomial distribution 23
Conjugacy of Dirichlet and Multinomial • If φ ∼ Dir( α ), w ∼ Mult( φ ), and n k = |{ w i : w i = k }| then p ( φ | α , w ) ∝ p ( w | φ ) p ( φ | α ) (1) Y Y 24
Conjugacy of Dirichlet and Multinomial • If φ ∼ Dir( α ), w ∼ Mult( φ ), and n k = |{ w i : w i = k }| then p ( φ | α , w ) ∝ p ( w | φ ) p ( φ | α ) (1) Y φ n k Y φ α k − 1 (2) ∝ k k Y φ α k + n k − 1 (3) ∝ k • Conjugacy: this posterior has the same form as the prior 25
Making the generative story formal λ β k K α θ d z n w n N M • For each topic k ∈ { 1 , . . . , K } , draw a multinomial distribution β k from a Dirichlet distribution with parameter λ 26
Making the generative story formal λ β k K α θ d z n w n N M • For each topic k ∈ { 1 , . . . , K } , draw a multinomial distribution β k from a Dirichlet distribution with parameter λ • For each document d ∈ { 1 , . . . , M } , draw a multinomial distribution θ d from a Dirichlet distribution with parameter α 27
Making the generative story formal λ β k K α θ d z n w n N M • For each topic k ∈ { 1 , . . . , K } , draw a multinomial distribution β k from a Dirichlet distribution with parameter λ • For each document d ∈ { 1 , . . . , M } , draw a multinomial distribution θ d from a Dirichlet distribution with parameter α • For each word position n ∈ { 1 , . . . , N } , select a hidden topic z n from the multinomial distribution parameterized by θ . 28
Making the generative story formal λ β k K α θ d z n w n N M • For each topic k ∈ { 1 , . . . , K } , draw a multinomial distribution β k from a Dirichlet distribution with parameter λ • For each document d ∈ { 1 , . . . , M } , draw a multinomial distribution θ d from a Dirichlet distribution with parameter α • For each word position n ∈ { 1 , . . . , N } , select a hidden topic z n from the multinomial distribution parameterized by θ . • Choose the observed word w n from the distribution β z n . 29
Topic models: What’s important • Topic models (latent variables) • Topics to word types—multinomial distribution • Documents to topics—multinomial distribution • Modeling & Algorithm • Model: story of how your data came to be • Latent variables: missing pieces of your story • Statistical inference: filling in those missing pieces • We use latent Dirichlet allocation (LDA), a fully Bayesian version of pLSI, probabilistic version of LSA 30
Which variables are hidden? 31
Size of Variable 32
Joint distribution 33
Joint distribution 34
Posterior distribution 35
Variational inference 36
KL divergence and evidence lower bound 37
KL divergence and evidence lower bound 38
A different way to get ELBO • Jensen’s inequality 39
Evidence Lower Bound 40
Evidence Lower Bound 41
Recommend
More recommend