supervised topic models
play

Supervised Topic Models Atallah Hezbor and Anant Kharkar Outline - PowerPoint PPT Presentation

Supervised Topic Models Atallah Hezbor and Anant Kharkar Outline (optional. Mostly for our reference) Intro [AK] LDA [AK] Objective Diagram Motivation for sLDA sLDA Expectation Maximization [AH]


  1. Supervised Topic Models Atallah Hezbor and Anant Kharkar

  2. Outline (optional. Mostly for our reference) ● Intro [AK] ● LDA [AK] ○ Objective ○ Diagram ○ Motivation for sLDA ● sLDA ○ Expectation Maximization [AH] ○ Variational Inference [AH] ○ E-step [AH] ○ M-step [AK] ○ Prediction [AK] ● Experimental Setup [AH] ● Results/Conclusions [AH]

  3. Introduction ● Topic modeling ○ Generally unsupervised ○ Learn topics - major clusters of content ● Latent Dirichlet Allocation ○ One method for topic modeling ○ Learn topic assignment for each document ● Learned topics often used for prediction ○ Analogous to PCA for regression/lasso ● sLDA - end-to-end learned LDA + regression ● Dirichlet Distribution ○ Takes parameter vector �

  4. Latent Dirichlet Allocation ● Objective - identify major topics in document ○ Topic = word probabilities ○ Use variational inference to compute parameters ○ � (topic distr), z (topic assign), w (word), � , � - Dirichlet ● Intractable posterior distr. ● Unsupervised topics may not be ideal for response prediction ○ Genre may not be optimal topics for movie reviews

  5. Supervised Latent Dirichlet Allocation ● Extend document generation model ○ Response variable ■ Numerical rating, number of likes ● Formulate posterior ○ Intractable to compute e l b a t c a r t n I

  6. Variational Inference ● Want to approximate posterior distribution ● Use Jensen’s inequality ○ log expectation >= expectation log ● Pick a family of variational distributions, Q ● Each q in Q has variational params: � , � ● Variational Expectation Maximization ○ E: Optimize w.r.t � , � ○ M: Optimize w.r.t model parameters

  7. Expectation Step ● Model Parameters are fixed ● � - parametrizes Dirichlet Distribution ● - the jth words distribution of topics ● Maximize LB with respect to � ● Maximize LB with respect to

  8. Maximization Step ● Estimate model parameters by maximizing corpus-level ELBO ● � 1:K - topic definitions (word distribution under topic k) Regression parameters - � , � 2 ● ○ Corpus-level analogue of log(p(response)) ○ Expected-value normal equations and update rules

  9. Prediction Learned model params - ⍺ , � 1:K , � , � ● � - regression coefficients learned on z for response y ○ ● Predict response Y for a specific document given learned model ● Variational approximation

  10. Experimental Setup ● Movie review corpus [Ratings] ● Digg article corpus [Number of Diggs] ● Compared against ○ LDA + regression ○ Lasso regression ● Metrics: ○ Predictive R-squared ○ Per-word log-likelihood

  11. Results ● 8%, 9.4% prediction improvement ● Better topic model for movie reviews ●

  12. Conclusions ● LDA adapted to a specific purpose ○ Learn optimal topics for a specific response ● Best of both worlds ○ Predict response ○ Preserve high topic likelihood ● Lingering questions ○ More real world examples - when does it work well? ○ How does it compare to deep feature learning?

  13. Backup Slide ● Variational Distribution q

Recommend


More recommend