machine learning 2
play

Machine Learning 2 DS 4420 - Spring 2020 Topic Modeling 1 Byron C. - PowerPoint PPT Presentation

Machine Learning 2 DS 4420 - Spring 2020 Topic Modeling 1 Byron C. Wallace Last time: Clustering > Mixture Models > Expectation Maximization (EM) Today: Topic models Mixture models Assume we are given data, ,


  1. Machine Learning 2 DS 4420 - Spring 2020 Topic Modeling 1 Byron C. Wallace

  2. Last time: Clustering —> Mixture Models —> Expectation Maximization (EM)

  3. Today: Topic models

  4. ��� Mixture models Assume we are given data, , consisting of fully unsupervised ex- amples in dimensions: Data: D = { � ( i ) } N i =1 where � ( i ) ∈ R M z ∼ Multinomial ( φ ) Generative Story: � ∼ p θ ( ·| z ) Model: p θ , φ ( � , z ) = p θ ( � | z ) p φ ( z ) Joint: K � p θ , φ ( � ) = p θ ( � | z ) p φ ( z ) Marginal: z =1 (Marginal) Log-likelihood: N � p θ , φ ( � ( i ) ) � ( θ ) = ��� i =1 N K � � p θ ( � ( i ) | z ) p φ ( z ) = z =1 i =1 Slide credit: Matt Gormley and Eric Xing (CMU)

  5. ��� Mixture models Assume we are given data, , consisting of fully unsupervised ex- amples in dimensions: Data: D = { � ( i ) } N i =1 where � ( i ) ∈ R M z ∼ Multinomial ( φ ) Generative Story: � ∼ p θ ( ·| z ) Model: p θ , φ ( � , z ) = p θ ( � | z ) p φ ( z ) Joint: K � p θ , φ ( � ) = p θ ( � | z ) p φ ( z ) Marginal: z =1 (Marginal) Log-likelihood: N � p θ , φ ( � ( i ) ) � ( θ ) = ��� i =1 N K � � p θ ( � ( i ) | z ) p φ ( z ) = z =1 i =1 Slide credit: Matt Gormley and Eric Xing (CMU)

  6. ��� Mixture models Assume we are given data, , consisting of fully unsupervised ex- amples in dimensions: Data: D = { � ( i ) } N i =1 where � ( i ) ∈ R M z ∼ Multinomial ( φ ) Generative Story: � ∼ p θ ( ·| z ) Model: p θ , φ ( � , z ) = p θ ( � | z ) p φ ( z ) Joint: K � p θ , φ ( � ) = p θ ( � | z ) p φ ( z ) Marginal: z =1 (Marginal) Log-likelihood: N � p θ , φ ( � ( i ) ) � ( θ ) = ��� i =1 N K � � p θ ( � ( i ) | z ) p φ ( z ) = z =1 i =1 Slide credit: Matt Gormley and Eric Xing (CMU)

  7. ��� Mixture models Assume we are given data, , consisting of fully unsupervised ex- amples in dimensions: Data: D = { � ( i ) } N i =1 where � ( i ) ∈ R M z ∼ Multinomial ( φ ) Generative Story: � ∼ p θ ( ·| z ) Model: p θ , φ ( � , z ) = p θ ( � | z ) p φ ( z ) Joint: K � p θ , φ ( � ) = p θ ( � | z ) p φ ( z ) Marginal: z =1 (Marginal) Log-likelihood: N � p θ , φ ( � ( i ) ) � ( θ ) = ��� i =1 N K � � p θ ( � ( i ) | z ) p φ ( z ) = z =1 i =1 Slide credit: Matt Gormley and Eric Xing (CMU)

  8. Naive Bayes The model N Y p ( c | w 1: N , π , θ ) ∝ p ( c | π ) p ( w n | θ c ) n =1 D N ! Y Y p ( D| θ 1: C , π ) = p ( c d | π ) p ( w n | θ c d ) n =1 d =1

  9. (Soft) EM Initialize parameters randomly while not converged 1. E-Step: Create one training example for each possible value of the latent variables Weight each example according to model’s confidence Treat parameters as observed 2. M-Step: Set the parameters to the values that maximizes likelihood Treat pseudo-counts from above as observed Slide credit: Matt Gormley and Eric Xing (CMU)

  10. And for NB of For EM times expected soft t occurs C in PC Zi L Count Xi tin Pct e Ix il F Plz c a in Xi Total Token Count

  11. TOPIC MODELS Some content borrowed from: 
 David Blei 
 (Columbia)

  12. Topic Models: Motivation Suppose we have a giant dataset (“corpus”) of text, e.g., all of the • NYTimes or all emails from a company ❖ Cannot read all documents ❖ But want to get a sense of what they contain

  13. Topic Models: Motivation Topic models are a way of uncovering, well, • “topics” (themes) in a set of documents Topic models are unsupervised • Can be viewed as a type of clustering, so follows • naturally from prior lectures; will come back to this.

  14. Topic Models: Motivation Topic models are a way of uncovering, well, • “topics” (themes) in a set of documents Topic models are unsupervised • Can be viewed as a type of clustering, so follows • naturally from prior lectures; will come back to this.

  15. Topic Models: Motivation Topic models are a way of uncovering, well, • “topics” (themes) in a set of documents Topic models are unsupervised • Can be viewed as a sort of soft clustering of • documents into topics.

  16. Topic 1 Topic 2 Topic 3 Topic 4 the i that easter “number” is proteins ishtar in the a satan to the of the which to have espn and i with hockey a of if but this “number” metaphorical english as you and evil there is run fact Example from Wallach, 2006

  17. Key outputs • Topics Distributions over words; we hope these are somehow thematically coherent • Document-topics Probabilistic assignments of topics to documents

  18. Example: Enron emails https://en.wikipedia.org/wiki/Enron_scandal https://www.cs.cmu.edu/~enron/

  19. Example: Enron emails Topic Terms 3 trading financial trade product price 6 gas capacity deal pipeline contract 9 state california davis power utilities 14 ferc issue order party case 22 group meeting team process plan Example from Boyd-Graber, Hu and Mimno, 2017 https://en.wikipedia.org/wiki/Enron_scandal

  20. Document-topic probabilities Yesterday, SDG&E filed a motion for adoption of an electric procurement cost recovery mechanism and for an order short- ening time for parties to file comments on the mechanism. The attached email from SDG&E contains the motion, an executive summary, and a detailed summary of their proposals and rec- ommendations governing procurement of the net short energy requirements for SDG&E’s customers. The utility requests a 15-day comment period, which means comments would have to be filed by September 10 (September 8 is a Saturday). Reply comments would be filed 10 days later. Topic Probability 9 0.42 11 0.05 8 0.05 Example from Boyd-Graber, Hu and Mimno, 2017

  21. Topics as Matrix Factorization • One can view topics as a kind of matrix factorization M × K × K × V ≈ M × V Topics Topic Assignment Dataset Figure from Boyd-Graber, Hu and Mimno, 2017

  22. Topics as Matrix Factorization • One can view topics as a kind of matrix factorization M × K × K × V ≈ M × V Topics Topic Assignment Dataset • We will try and take a more probabilistic view, but useful to keep this in mind Figure from Boyd-Graber, Hu and Mimno, 2017

  23. Probabilistic Word Mixtures Idea: Model text as a mixture over words (ignore order) gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, Words: Topics:

  24. Topic Modeling Topics Words in Document Topic Proportions (shared) (mixture over topics) (document-specific) assignments gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, Idea: Model corpus of documents with shared topics

  25. Topic Modeling Topics Words in Document Topic Proportions (shared) (mixture over topics) (document-specific) assignments gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, • Each topic is a distribution over words • Each document is a mixture over topics • Each word is drawn from one topic distribution

  26. Topic Modeling Topics Words in Document Topic Proportions (shared) (mixture over topics) (document-specific) assignments gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, • Each topic is a distribution over words • Each document is a mixture over topics • Each word is drawn from one topic distribution

  27. Topic Modeling Topics Words in Document Topic Proportions (shared) (mixture over topics) (document-specific) assignments gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, • Each topic is a distribution over words • Each document is a mixture over topics • Each word is drawn from one topic distribution

Recommend


More recommend