An N-gram Topic Model for Time-Stamped Documents Shoaib Jameel and Wai Lam The Chinese University of Hong Kong Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
Outline Introduction and Motivation ◮ The Bag-of-Words (BoW) assumption ◮ Temporal nature of data Related Work ◮ Temporal Topic Models ◮ N-gram Topic Models Overview of our model ◮ Background ⋆ Topics Over Time (TOT) Model - proposed earlier ⋆ Our proposed n-gram model Empirical Evaluation Conclusions and Future Directions Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
The ‘popular’ Bag-of-Words Assumption Many works in the topic modeling literature assume exchangeability among the words. As a result generate ambiguous words in topics. For example, consider few topics obtained from the NIPS collection using the Latent Dirichlet Allocation (LDA) model: Example Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 architecture order connectionist potential prior recurrent first role membrane bayesian network second binding current data module analysis structures synaptic evidence modules small distributed dendritic experts The problem with the LDA model Words in topics are not insightful. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
The ‘popular’ Bag-of-Words Assumption Many works in the topic modeling literature assume exchangeability among the words. As a result generate ambiguous words in topics. For example, consider few topics obtained from the NIPS collection using the Latent Dirichlet Allocation (LDA) model: Example Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 architecture order connectionist potential prior recurrent first role membrane bayesian network second binding current data module analysis structures synaptic evidence modules small distributed dendritic experts The problem with the LDA model Words in topics are not insightful. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
The problem with the bag-of-words assumption Logical structure of the document is lost. For example, we do not 1 know whether “ the cat saw a dog or a dog saw a cat ”. The computational models cannot tap an extra word order 2 information inherent in the text. Therefore, affects the performance. The usefulness of maintaining the word order has also been 3 illustrated in Information Retrieval , Computational Linguistics and many other fields. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
The problem with the bag-of-words assumption Logical structure of the document is lost. For example, we do not 1 know whether “ the cat saw a dog or a dog saw a cat ”. The computational models cannot tap an extra word order 2 information inherent in the text. Therefore, affects the performance. The usefulness of maintaining the word order has also been 3 illustrated in Information Retrieval , Computational Linguistics and many other fields. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
The problem with the bag-of-words assumption Logical structure of the document is lost. For example, we do not 1 know whether “ the cat saw a dog or a dog saw a cat ”. The computational models cannot tap an extra word order 2 information inherent in the text. Therefore, affects the performance. The usefulness of maintaining the word order has also been 3 illustrated in Information Retrieval , Computational Linguistics and many other fields. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
Why capture topics over time? We know that data evolves over time. 1 What people are talking today may not be talking tomorrow or an 2 year after. Wikipedia Gaza Strip Burj Khalifa N.Z Earthquake Volcano Sachin Tendulkar Manila Hostage Osama bin Laden China Iraq War Higgs Boson Apple Inc. Year-2010 Year-2011 Year-2012 Models such as LDA do not capture such temporal characteristics 3 in data. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
Related Work Temporal Topic Models Discrete time assumption models ◮ Blei et al., (David M. Blei and John D. Lafferty. 2006.) - Dynamic Topic Models - assume that topics in one year are dependent on the topics of the previous year. ◮ Knights et al., (Knights, D., Mozer, M., and Nicolov, N. 2009.) - Compound Topic Model - Train a topic model on the most recent K months of data. The problem here One needs to select an appropriate time slice value manually. The question is which time slice be chosen: day, month, year, etc.? Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
Related Work Temporal Topic Models Discrete time assumption models ◮ Blei et al., (David M. Blei and John D. Lafferty. 2006.) - Dynamic Topic Models - assume that topics in one year are dependent on the topics of the previous year. ◮ Knights et al., (Knights, D., Mozer, M., and Nicolov, N. 2009.) - Compound Topic Model - Train a topic model on the most recent K months of data. The problem here One needs to select an appropriate time slice value manually. The question is which time slice be chosen: day, month, year, etc.? Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
Related Work Temporal Topic Models Continuous Time Topic Models ◮ Noriaki (Noriaki Kawamae. 2011.) - Trend Analysis Model - The model has a probability distribution over temporal words, topics, and a continuous distribution over time. ◮ Uri et al., (Uri Nodelman, Christian R. Shelton, and Daphne Koller. 2002.) - Continuous Time Bayesian Networks - Builds a graph where each variable lies in the node whose values change over time. The problem with the above models All assume the notion of exchangeability and thus lose important collocation information inherent in the document. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
Related Work Temporal Topic Models Continuous Time Topic Models ◮ Noriaki (Noriaki Kawamae. 2011.) - Trend Analysis Model - The model has a probability distribution over temporal words, topics, and a continuous distribution over time. ◮ Uri et al., (Uri Nodelman, Christian R. Shelton, and Daphne Koller. 2002.) - Continuous Time Bayesian Networks - Builds a graph where each variable lies in the node whose values change over time. The problem with the above models All assume the notion of exchangeability and thus lose important collocation information inherent in the document. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
Related Work N-gram Topic Models Wallach’s (Hanna M. Wallach. 2006.) bigram topic model. 1 Maintains word order during topic generation process. Generates only bigram words in topics. Griffiths et al. (Griffiths, T. L., Steyvers, M., and Tenenbaum, J. B. 2 2007.) - LDA Collocation Model. Introduced binary random variables which decides when to generate a unigram or a bigram. Wang et al. (Wang, X., McCallum, A., and Wei, X. 2007.) - Topical 3 N-gram Model - Extends the LDA Collocation Model. Gives topic assignment to every word in the phrase. The problem with the above models Cannot capture the temporal dynamics in data. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
Related Work N-gram Topic Models Wallach’s (Hanna M. Wallach. 2006.) bigram topic model. 1 Maintains word order during topic generation process. Generates only bigram words in topics. Griffiths et al. (Griffiths, T. L., Steyvers, M., and Tenenbaum, J. B. 2 2007.) - LDA Collocation Model. Introduced binary random variables which decides when to generate a unigram or a bigram. Wang et al. (Wang, X., McCallum, A., and Wei, X. 2007.) - Topical 3 N-gram Model - Extends the LDA Collocation Model. Gives topic assignment to every word in the phrase. The problem with the above models Cannot capture the temporal dynamics in data. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
Topics Over Time (TOT) (Wang et al., 2006) Our model extends from this model. 1 Assumes the notion of word and topic exchangeability. 2 Generative Process Topics Over Time Model (TOT) Draw T multinomials φ z from a 1 Dirichlet Prior β , one for each α topic z For each document d , draw a 2 multinomial θ ( d ) from a Dirichlet θ prior α ; then for each word w ( d ) in i the document d Draw a topic z d z i from β 1 Multinomial θ ( d ) Draw a word w ( d ) from 2 i Multinomial φ z ( d ) i φ w t Ω Draw a timestamp t ( d ) from 3 N d i Beta Ω z ( d ) T D i Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
Topics Over Time Model (TOT) The model assumes a continuous distribution over time 1 associated with each topic. Topics are responsible for generating both observed time-stamps 2 and also words. The model does not capture the sequence of state changes with a 3 Markov assumption. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
Topics Over Time Model (TOT) The model assumes a continuous distribution over time 1 associated with each topic. Topics are responsible for generating both observed time-stamps 2 and also words. The model does not capture the sequence of state changes with a 3 Markov assumption. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
Topics Over Time Model (TOT) The model assumes a continuous distribution over time 1 associated with each topic. Topics are responsible for generating both observed time-stamps 2 and also words. The model does not capture the sequence of state changes with a 3 Markov assumption. Shoaib Jameel and Wai Lam ECIR-2013, Moscow, Russia
Recommend
More recommend