Supervised Topic Models Atallah Hezbor and Anant Kharkar
Outline (optional. Mostly for our reference) ● Intro [AK] ● LDA [AK] ○ Objective ○ Diagram ○ Motivation for sLDA ● sLDA ○ Expectation Maximization [AH] ○ Variational Inference [AH] ○ E-step [AH] ○ M-step [AK] ○ Prediction [AK] ● Experimental Setup [AH] ● Results/Conclusions [AH]
Introduction ● Topic modeling ○ Generally unsupervised ○ Learn topics - major clusters of content ● Latent Dirichlet Allocation ○ One method for topic modeling ○ Learn topic assignment for each document ● Learned topics often used for prediction ○ Analogous to PCA for regression/lasso ● sLDA - end-to-end learned LDA + regression ● Dirichlet Distribution ○ Takes parameter vector �
Latent Dirichlet Allocation ● Objective - identify major topics in document ○ Topic = word probabilities ○ Use variational inference to compute parameters ○ � (topic distr), z (topic assign), w (word), � , � - Dirichlet ● Intractable posterior distr. ● Unsupervised topics may not be ideal for response prediction ○ Genre may not be optimal topics for movie reviews
Supervised Latent Dirichlet Allocation ● Extend document generation model ○ Response variable ■ Numerical rating, number of likes ● Formulate posterior ○ Intractable to compute e l b a t c a r t n I
Variational Inference ● Want to approximate posterior distribution ● Use Jensen’s inequality ○ log expectation >= expectation log ● Pick a family of variational distributions, Q ● Each q in Q has variational params: � , � ● Variational Expectation Maximization ○ E: Optimize w.r.t � , � ○ M: Optimize w.r.t model parameters
Expectation Step ● Model Parameters are fixed ● � - parametrizes Dirichlet Distribution ● - the jth words distribution of topics ● Maximize LB with respect to � ● Maximize LB with respect to
Maximization Step ● Estimate model parameters by maximizing corpus-level ELBO ● � 1:K - topic definitions (word distribution under topic k) Regression parameters - � , � 2 ● ○ Corpus-level analogue of log(p(response)) ○ Expected-value normal equations and update rules
Prediction Learned model params - ⍺ , � 1:K , � , � ● � - regression coefficients learned on z for response y ○ ● Predict response Y for a specific document given learned model ● Variational approximation
Experimental Setup ● Movie review corpus [Ratings] ● Digg article corpus [Number of Diggs] ● Compared against ○ LDA + regression ○ Lasso regression ● Metrics: ○ Predictive R-squared ○ Per-word log-likelihood
Results ● 8%, 9.4% prediction improvement ● Better topic model for movie reviews ●
Conclusions ● LDA adapted to a specific purpose ○ Learn optimal topics for a specific response ● Best of both worlds ○ Predict response ○ Preserve high topic likelihood ● Lingering questions ○ More real world examples - when does it work well? ○ How does it compare to deep feature learning?
Backup Slide ● Variational Distribution q
Recommend
More recommend