Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 19: EM algorithm, Topic modeling Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1
Administrivia • HW4 due, HW5 out • Remember that we only count the highest 4 homework scores • Final project midpoint presentation • For the final project, each person will be asked to summarize what everyone in the team did • Contact information for printing 2
Second Month Survey Second survey First survey 3
Second Month Survey • Conflicting opinions • wide variety of models, good explanations, good homeworks • Clarity of HW grading is the worst I have ever had for a class. • Depth of content covered • Course is too theory heavy • I liked that the instructor not only requested feedback often, but also acted upon the feedback, changing a few things about how the class and slides are presented. 4
Second Month Survey • Increase exam duration • The professor needs to slow down, and sacrifice some of the math subtleties and complexities in favor of concrete understanding of the topics. • Go into the weeds of the math less 5
Learning Objectives • Learn about Expectation-Maximization algorithm • Learn about latent Dirichlet allocation 6
Gaussian Mixture Models 7
Gaussian Mixture Models 8
Gaussian Mixture Models ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x2 ● ● ● 0 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● − 2 ● ● ● ● ● ● ● ● ● ● ● ● ● − 4 − 4 − 2 0 2 4 9 x1
Gaussian Mixture Models 10
Gaussian Mixture Models 11
Gaussian Mixture Models 12
Gaussian Mixture Models 13
Gaussian Mixture Models 14
Gaussian Mixture Models 15
Gaussian Mixture Models 16
Gaussian Mixture Models 17
Latent Variables • z’s correspond to the latent structure that we try to learn in unsupervised learning • From a modeling perspective, they are usually referred to as latent variables 18
EM Algorithm 19
EM Algorithm 20
EM Algorithm 21
EM Algorithm 22
EM Algorithm • EM stands for Expectation-Maximization • A classic algorithm in Dempster, Laird, Rubin, 1977 • An iterative method 23
EM Algorithm 24
EM Algorithm 25
EM Algorithm 26
EM Algorithm 27
EM Algorithm 28
EM Algorithm 29
EM Algorithm 30
EM Algorithm 31
EM Algorithm 32
EM Algorithm 33
EM Algorithm 34
EM Algorithm 35
GMM and K-means 36
GMMs and the EM algorithm • GMMs with the EM Algorithm suffer from some of the same problems as K-Means • Doesn't really work with categorical data • Usually only converges to a local minimum • Have to determine the number of clusters • Only generates convex clusters • But, it also has certain advantages • The clusters are allowed different shapes • We get a soft partitioning of the data 37
Topic models • Discrete count data 38
Topic models • Suppose you have a huge number of documents • Want to know what's going on • Can't read them all (e.g. every New York Times article from the 90's) • Topic models offer a way to get a corpus-level view of major themes • Unsupervised 39
Conceptual approach • Input: a text corpus and number of topics K • Output: Corpus • K topics, each topic is a list of words • Topic assignment for each document Forget the Bootleg, Just Download the Movie Legally Multiplex Heralded As Linchpin To Growth The Shape of Cinema, Transformed At the Click of A Peaceful Crew Puts a Mouse Muppets Where Its Mouth Is Stock Trades: A Better Deal For Investors Isn't Simple The three big Internet portals begin to distinguish Red Light, Green Light: A among themselves as 2-Tone L.E.D. to shopping malls Simplify Screens 40
Conceptual approach • K topics, each topic is a list of words TOPIC 1 TOPIC 2 TOPIC 3 computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine 41
Conceptual approach • Topic assignment for each document Internet portals Red Light, Green Stock Trades: A begin to distinguish Light: A Better Deal For among themselves 2-Tone L.E.D. to Investors Isn't as shopping malls Simplify Screens Simple Forget the TOPIC 1 TOPIC 2 Bootleg, Just "TECHNOLOGY" "BUSINESS" Download the Movie Legally Multiplex Heralded The Shape of As Linchpin To Cinema, Growth Transformed At the Click of a A Peaceful Crew Mouse TOPIC 3 Puts Muppets "ENTERTAINMENT" Where Its Mouth Is 42
Topics from Science 43
Why should you care? • Neat way to explore/understand corpus collections • E-discovery • Social media • Scientific data • NLP Applications • Word sense disambiguation • Discourse segmentation • Psychology: word meaning, polysemy • A general way to model count data and a general inference algorithm 44
Topic models • Discrete count data • Gaussian distributions are not appropriate 45
Generative model: Latent Dirichlet Allocation • Generate a document, or a bag of words • Blei, Ng, Jordan. Latent Dirichlet Allocation. JMLR, 2003. 46
Generative model: Latent Dirichlet Allocation • Generate a document, or a bag (1,0,0) (0,0,1) (0,1,0) of words • Multinomial distribution • Distribution over discrete outcomes • Represented by non-negative vector that sums to one (1/3,1/3,1/3) (1/4,1/4,1/2) (1/2,1/2,0) • Picture representation 47
Generative model: Latent Dirichlet Allocation • Generate a document, or a bag (1,0,0) (0,0,1) (0,1,0) of words • Multinomial distribution • Distribution over discrete outcomes • Represented by non-negative vector that sums to one (1/3,1/3,1/3) (1/4,1/4,1/2) (1/2,1/2,0) • Picture representation • Come from a Dirichlet distribution 48
Generative story computer, TOPIC 1 technology, system, service, site, phone, internet, machine TOPIC 2 sell, sale, store, product, business, advertising, market, consumer TOPIC 3 play, film, movie, theater, production, star, director, stage 49
Generative story The three big Internet portals begin to distinguish among themselves as shopping malls Red Light, Green Light: A Stock Trades: A Better Deal 2-Tone L.E.D. to For Investors Isn't Simple Simplify Screens TOPIC 1 TOPIC 2 Forget the Bootleg, Just Download the Movie Legally The Shape of Cinema, Multiplex Heralded As Transformed At the Click of Linchpin To Growth a Mouse A Peaceful Crew Puts Muppets Where Its Mouth Is TOPIC 3 50
Generative story computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ... 51
Generative story computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ... 52
Generative story computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ... 53
Generative story computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ... 54
Missing component: how to generate a multinomial distribution 55
Missing component: how to generate a multinomial distribution 56
Missing component: how to generate a multinomial distribution 57
Recommend
More recommend