department of computer science csci 5622 machine learning
play

Department of Computer Science CSCI 5622: Machine Learning Chenhao - PowerPoint PPT Presentation

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 20: Topic modeling and variational inferrence Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1 Administrivia Poster printing (stay tuned!) HW 5


  1. Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 20: Topic modeling and variational inferrence Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1

  2. Administrivia • Poster printing (stay tuned!) • HW 5 (final homework) is due next Friday! • Midpoint feedback 2

  3. Learning Objectives • Learn about latent Dirichlet allocation • Understand the inituion behind variational inference 3

  4. Topic models • Discrete count data 4

  5. Topic models • Suppose you have a huge number of documents • Want to know what's going on • Can't read them all (e.g. every New York Times article from the 90's) • Topic models offer a way to get a corpus-level view of major themes • Unsupervised 5

  6. Why should you care? • Neat way to explore/understand corpus collections • E-discovery • Social media • Scientific data • NLP Applications • Word sense disambiguation • Discourse segmentation • Psychology: word meaning, polysemy • A general way to model count data and a general inference algorithm 6

  7. Conceptual approach • Input: a text corpus and number of topics K • Output: Corpus • K topics, each topic is a list of words • Topic assignment for each document Forget the Bootleg, Just Download the Movie Legally Multiplex Heralded As Linchpin To Growth The Shape of Cinema, Transformed At the Click of A Peaceful Crew Puts a Mouse Muppets Where Its Mouth Is Stock Trades: A Better Deal For Investors Isn't Simple The three big Internet portals begin to distinguish Red Light, Green Light: A among themselves as 2-Tone L.E.D. to shopping malls Simplify Screens 7

  8. Conceptual approach • K topics, each topic is a list of words TOPIC 1 TOPIC 2 TOPIC 3 computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine 8

  9. Conceptual approach • Topic assignment for each document Internet portals Red Light, Green Stock Trades: A begin to distinguish Light: A Better Deal For among themselves 2-Tone L.E.D. to Investors Isn't as shopping malls Simplify Screens Simple Forget the TOPIC 1 TOPIC 2 Bootleg, Just "TECHNOLOGY" "BUSINESS" Download the Movie Legally Multiplex Heralded The Shape of As Linchpin To Cinema, Growth Transformed At the Click of a A Peaceful Crew Mouse TOPIC 3 Puts Muppets "ENTERTAINMENT" Where Its Mouth Is 9

  10. Topics from Science 10

  11. Topic models • Discrete count data • Gaussian distributions are not appropriate 11

  12. Generative model: Latent Dirichlet Allocation • Generate a document, or a bag of words • Blei, Ng, Jordan. Latent Dirichlet Allocation. JMLR, 2003. 12

  13. Generative model: Latent Dirichlet Allocation • Generate a document, or a bag (1,0,0) (0,0,1) (0,1,0) of words • Multinomial distribution • Distribution over discrete outcomes • Represented by non-negative vector that sums to one (1/3,1/3,1/3) (1/4,1/4,1/2) (1/2,1/2,0) • Picture representation 13

  14. Generative model: Latent Dirichlet Allocation • Generate a document, or a bag (1,0,0) (0,0,1) (0,1,0) of words • Multinomial distribution • Distribution over discrete outcomes • Represented by non-negative vector that sums to one (1/3,1/3,1/3) (1/4,1/4,1/2) (1/2,1/2,0) • Picture representation • Come from a Dirichlet distribution 14

  15. Generative story computer, TOPIC 1 technology, system, service, site, phone, internet, machine TOPIC 2 sell, sale, store, product, business, advertising, market, consumer TOPIC 3 play, film, movie, theater, production, star, director, stage 15

  16. Generative story The three big Internet portals begin to distinguish among themselves as shopping malls Red Light, Green Light: A Stock Trades: A Better Deal 2-Tone L.E.D. to For Investors Isn't Simple Simplify Screens TOPIC 1 TOPIC 2 Forget the Bootleg, Just Download the Movie Legally The Shape of Cinema, Multiplex Heralded As Transformed At the Click of Linchpin To Growth a Mouse A Peaceful Crew Puts Muppets Where Its Mouth Is TOPIC 3 16

  17. Generative story computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ... 17

  18. Generative story computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ... 18

  19. Generative story computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ... 19

  20. Generative story computer, sell, sale, technology, play, film, store, product, system, movie, theater, business, service, site, production, advertising, phone, star, director, market, internet, stage consumer machine Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's iTunes music store and other online services ... 20

  21. Missing component: how to generate a multinomial distribution 21

  22. Missing component: how to generate a multinomial distribution 22

  23. Missing component: how to generate a multinomial distribution 23

  24. Conjugacy of Dirichlet and Multinomial • If φ ∼ Dir( α ), w ∼ Mult( φ ), and n k = |{ w i : w i = k }| then p ( φ | α , w ) ∝ p ( w | φ ) p ( φ | α ) (1) Y Y 24

  25. Conjugacy of Dirichlet and Multinomial • If φ ∼ Dir( α ), w ∼ Mult( φ ), and n k = |{ w i : w i = k }| then p ( φ | α , w ) ∝ p ( w | φ ) p ( φ | α ) (1) Y φ n k Y φ α k − 1 (2) ∝ k k Y φ α k + n k − 1 (3) ∝ k • Conjugacy: this posterior has the same form as the prior 25

  26. Making the generative story formal λ β k K α θ d z n w n N M • For each topic k ∈ { 1 , . . . , K } , draw a multinomial distribution β k from a Dirichlet distribution with parameter λ 26

  27. Making the generative story formal λ β k K α θ d z n w n N M • For each topic k ∈ { 1 , . . . , K } , draw a multinomial distribution β k from a Dirichlet distribution with parameter λ • For each document d ∈ { 1 , . . . , M } , draw a multinomial distribution θ d from a Dirichlet distribution with parameter α 27

  28. Making the generative story formal λ β k K α θ d z n w n N M • For each topic k ∈ { 1 , . . . , K } , draw a multinomial distribution β k from a Dirichlet distribution with parameter λ • For each document d ∈ { 1 , . . . , M } , draw a multinomial distribution θ d from a Dirichlet distribution with parameter α • For each word position n ∈ { 1 , . . . , N } , select a hidden topic z n from the multinomial distribution parameterized by θ . 28

  29. Making the generative story formal λ β k K α θ d z n w n N M • For each topic k ∈ { 1 , . . . , K } , draw a multinomial distribution β k from a Dirichlet distribution with parameter λ • For each document d ∈ { 1 , . . . , M } , draw a multinomial distribution θ d from a Dirichlet distribution with parameter α • For each word position n ∈ { 1 , . . . , N } , select a hidden topic z n from the multinomial distribution parameterized by θ . • Choose the observed word w n from the distribution β z n . 29

  30. Topic models: What’s important • Topic models (latent variables) • Topics to word types—multinomial distribution • Documents to topics—multinomial distribution • Modeling & Algorithm • Model: story of how your data came to be • Latent variables: missing pieces of your story • Statistical inference: filling in those missing pieces • We use latent Dirichlet allocation (LDA), a fully Bayesian version of pLSI, probabilistic version of LSA 30

  31. Which variables are hidden? 31

  32. Size of Variable 32

  33. Joint distribution 33

  34. Joint distribution 34

  35. Posterior distribution 35

  36. Variational inference 36

  37. KL divergence and evidence lower bound 37

  38. KL divergence and evidence lower bound 38

  39. A different way to get ELBO • Jensen’s inequality 39

  40. Evidence Lower Bound 40

  41. Evidence Lower Bound 41

Recommend


More recommend