Mixed Membership Markov Models for Unsupervised Conversation Modeling MICHAEL J. PAUL JOHNS HOPKINS UNIVERSITY
Conversation Modeling: High Level Idea 2 We’ll be modeling sequences of documents ¡ e.g. a sequence of email messages from a conversation We’ll use M 4 = M ixed M embership M arkov M odels M 4 is a combination of ¡ Topic models (LDA, PLSA, etc.) ÷ Documents are mixtures of latent classes/topics ¡ Hidden Markov models ÷ Documents in a sequence depend on the previous document M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Generative Models of Text 3 Some distinctions to consider… Inter-document structure Intra-document Independent Markov structure Single-Class Naïve Bayes HMM Mixed- LDA This talk! J Membership M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Overview 4 Unsupervised Content Models ¡ Naïve Bayes ¡ Topic Models Unsupervised Conversation Modeling ¡ Hidden Markov Models Mixed Membership Markov Models (M 4 ) ¡ Overview ¡ Inference Experiments with Conversation Data ¡ Thread reconstruction ¡ Speech act induction M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Motivation: Unsupervised Models 5 Huge amounts of unstructured and unannotated data on the Web Unsupervised models can help manage this data and are robust to variations in language and genre Tools like topic models can uncover interesting patterns in large corpora M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
(Unsupervised) Naïve Bayes 6 θ class distribution � • Each document belongs to some category/class z z z z class � • Each class z is associated with its own w w w distribution over words words � N N N Doc 1 Doc 2 Doc 3 M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
(Unsupervised) Naïve Bayes 7 football 0.03 team 0.01 “SPORTS” hockey 0.01 baseball 0.005 … … charge 0.02 probability court 0.02 imaginary distributions “CRIME” police 0.015 class over words robbery 0.01 labels … … congress 0.02 president 0.02 “POLITICS” election 0.015 senate 0.01 … … M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
(Unsupervised) Naïve Bayes 8 football 0.03 team 0.01 hockey 0.01 baseball 0.005 … … charge 0.02 court 0.02 police 0.015 robbery 0.01 … … congress 0.02 president 0.02 election 0.015 senate 0.01 … … M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
(Unsupervised) Naïve Bayes? 9 football 0.03 team 0.01 hockey 0.01 baseball 0.005 … … What if an article belongs charge 0.02 to more than one category? court 0.02 police 0.015 robbery 0.01 … … congress 0.02 president 0.02 election 0.015 senate 0.01 … … M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
(Unsupervised) Naïve Bayes? 10 football 0.03 Jury Finds Baseball Star team 0.01 Roger Clemens Not Guilty On hockey 0.01 baseball 0.005 All Counts … … charge 0.02 court 0.02 police 0.015 robbery 0.01 … … A jury found baseball star Roger Clemens not guilty on six charges congress 0.02 against. Clemens was accused of lying president 0.02 to Congress in 2008 about his use of election 0.015 performance enhancing drugs . senate 0.01 … … M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Topic Models 11 football 0.03 … team 0.01 Doc 1 hockey 0.01 baseball 0.005 … … charge 0.02 court 0.02 Doc 2 police 0.015 robbery 0.01 … … congress 0.02 president 0.02 Doc 3 election 0.015 … senate 0.01 … … M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Topic Models 12 θ θ θ • One class distribution θ d per document z z z • One class value per token • (rather than per document) w w w N N N T. Hofmann. Probabilistic Latent Doc 1 Doc 2 Doc 3 Semantic Indexing. SIGIR 1999. M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Latent Dirichlet Allocation (LDA) 13 α Dirichlet prior � D. Blei, A. Ng, M. Jordan. Latent Dirichlet Allocation. JMLR 2003. θ θ θ • One class distribution θ d per document z z z • One class value per token • (rather than per document) w w w N N N Doc 1 Doc 2 Doc 3 M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Overview 14 Unsupervised Content Models Unsupervised Conversation Modeling Mixed Membership Markov Models Experiments with Conversation Data Conclusion M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Conversation Modeling 15 Documents on the web are more complicated than news articles M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Conversation Modeling 16 Documents on the web are more complicated than news articles M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Conversation Modeling 17 What’s missing from Naïve Bayes and LDA? ¡ They assume documents are generated independently of each other Messages in conversations aren’t at all independent ¡ Doesn’t make sense to pretend that they are ¡ But we’d like to represent this dependence in a reasonably simple way Solution: Hidden Markov Models M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Block HMM 18 • Message emitted at each time step of Markov chain π transition parameters (matrix) � z z z class � • Each message in thread w w w depends on the message to which it is a response N N N Message 1 Message 2 Message 3 M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Bayesian Block HMM 19 α Dirichlet prior � A. Ritter, C. Cherry, B. Dolan. π Unsupervised Modeling of Twitter Conversations. HLT-NAACL 2010. z z z • Each message in thread w w w depends on the message to which it is a response N N N Message 1 Message 2 Message 3 M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Block HMM 20 hey 0.1 GREETING SPORTS football 0.03 sup 0.06 team 0.01 hi 0.04 hockey 0.01 hello 0.01 baseball 0.005 … … … … what 0.03 charge 0.02 QUESTION CRIME what’s 0.025 court 0.02 how 0.02 police 0.015 is 0.02 robbery 0.01 … … … … lol 0.04 congress 0.02 LAUGHTER POLITICS haha 0.04 president 0.02 :) 0.03 election 0.015 lmao 0.01 senate 0.01 … … … … M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Block HMM 21 Nice and simple way to model dependencies between messages This is similar to Naïve Bayes ¡ One class per document! Let’s make it more like LDA ¡ Documents are mixtures of classes M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Generative Models of Text 22 Inter-document structure Independent Markov Intra-document structure Single-Class Mixed- This talk! J Membership M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Overview 23 Unsupervised Content Models Unsupervised Conversation Modeling Mixed Membership Markov Models Experiments with Conversation Data Conclusion M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Mixed Membership Markov Models (M 4 ) 24 Λ transition parameters � class distribution π π π (function of z and λ ) � • Like LDA z z z • One distribution π d per doc • One class z per token w w w • But now each message’s distribution depends on the class N N N assignments of previous message Message 1 Message 2 Message 3 M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Mixed Membership Markov Models (M 4 ) 25 Λ transition parameters � class distribution π π π (function of z and λ ) � z z z Probability of class j in message d π dj ∝ exp( λ j T z d -1 ) w w w N N N log-linear function Message 1 Message 2 Message 3 of previous message M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Mixed Membership Markov Models (M 4 ) 26 Λ • Why not transition directly from π π π π to π ? • Makes more sense for next z z z message to depend on actual classes of previous message (not the distribution over all w w w possible classes) N N N Message 1 Message 2 Message 3 M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Example 27 Suppose documents are mixtures of 4 classes: G R B Then Λ is a 4x4 matrix with values such as: λ G → R = –0.2 “The presence of G in doc 1 slightly decreases the likelihood of having R in doc 2” “The presence of B in doc 1 greatly increases the λ B → B = 5.0 likelihood of having B in doc 2” M.J. Paul. Mixed Membership Markov Models. EMNLP 2012. Jeju Island, Korea.
Recommend
More recommend