the structural topic model and applied social science
play

The Structural Topic Model and Applied Social Science Molly Roberts, - PowerPoint PPT Presentation

The Structural Topic Model and Applied Social Science Molly Roberts, Brandon Stewart, Dustin Tingley, Edoardo Airoldi Harvard University, Departments of Government and Statistics December 10, 2013 Roberts Et. Al (Harvard) STM 12/10/2013 1 /


  1. The Structural Topic Model and Applied Social Science Molly Roberts, Brandon Stewart, Dustin Tingley, Edoardo Airoldi Harvard University, Departments of Government and Statistics December 10, 2013 Roberts Et. Al (Harvard) STM 12/10/2013 1 / 20

  2. Related Work Roberts Et. Al (Harvard) STM 12/10/2013 2 / 20

  3. Related Work Roberts ME, Stewart BM, Airoldi EM. A Topic Model for Experimentation in the Social Sciences. Roberts Et. Al (Harvard) STM 12/10/2013 2 / 20

  4. Related Work Roberts ME, Stewart BM, Airoldi EM. A Topic Model for Experimentation in the Social Sciences. Roberts ME, Stewart BM, Tingley D, Lucas C, Leder-Luis J, Gadarian S, Albertson B, Rand D. Structural topic models for open-ended survey responses. Forthcoming at American Journal of Political Science . Roberts Et. Al (Harvard) STM 12/10/2013 2 / 20

  5. How Do Senators Relate to Constituents? Press Attention − Speech Attention serv academi budget judg immigr brac bankruptci iraq war vote border oil tax public social children honor cell secur trade drug nuclear student beef health histor drug climat school land energi veteran secur disast farm violenc guard program water busi million depart 000 fund −0.04 −0.02 0 0.02 0.04 Difference Grimmer (2010, 2013) Roberts Et. Al (Harvard) STM 12/10/2013 3 / 20

  6. Why do some Muslim clerics support violent Jihad? 100 Topics Occuring in "Normal" Fatwas (Jihad Score < 0 ) Favorite Jihadi Topics Evenly Split Topics Shariah 0.04 Non−Jihadi Clerics <−−− topic used more by −−−> Jihadi Clerics The Prophet Sin Sheikh Uthaymeen Ibn Taymiyya God's Oneness Ablutions Quran Money Knowledge Prayer 0.02 Apostasy ● Permissibility ● Quran Heaven and Hell (Difference in Topic Frequencies) ● Ulama Hajj ● ● ● Heaven and Earth Duty ● Knowledge ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Favorite Non−Jihadi Topics ● ● ● ● Hadeeth Bootstrapped 95% ● −0.02 Dating Confidence Interval Zakat Surahs and Verses Hadeeth Ramadan Fasting Hadeeth −0.04 Divorce, Marriage, Sex Fatwa Greeting Formula ● Nielsen (2013) Roberts Et. Al (Harvard) STM 12/10/2013 4 / 20

  7. How do we analyze open-ended survey response? Topic 1 and Party ID 1.0 0.8 Mean Topic Proportions 0.6 Treated 0.4 0.2 Control 0.0 ● Strong Moderate Strong Democrat Republican Roberts Et. Al (Harvard) STM 12/10/2013 5 / 20

  8. Social Sciences Applications These problems share a common structure: Roberts Et. Al (Harvard) STM 12/10/2013 6 / 20

  9. Social Sciences Applications These problems share a common structure: Topic models as a tool of measurement Roberts Et. Al (Harvard) STM 12/10/2013 6 / 20

  10. Social Sciences Applications These problems share a common structure: Topic models as a tool of measurement ◮ events between countries (O’Connor et al 2013) ◮ “constitutional moments” (Stewart and Young 2013) ◮ media control in China (Stewart and Roberts 2014) Extensive “metadata” in documents Roberts Et. Al (Harvard) STM 12/10/2013 6 / 20

  11. Social Sciences Applications These problems share a common structure: Topic models as a tool of measurement ◮ events between countries (O’Connor et al 2013) ◮ “constitutional moments” (Stewart and Young 2013) ◮ media control in China (Stewart and Roberts 2014) Extensive “metadata” in documents Topical Prevalence and Topical Content Roberts Et. Al (Harvard) STM 12/10/2013 6 / 20

  12. Social Sciences Applications These problems share a common structure: Topic models as a tool of measurement ◮ events between countries (O’Connor et al 2013) ◮ “constitutional moments” (Stewart and Young 2013) ◮ media control in China (Stewart and Roberts 2014) Extensive “metadata” in documents Topical Prevalence and Topical Content Primary QOI is how external variable drives topics. Roberts Et. Al (Harvard) STM 12/10/2013 6 / 20

  13. In Practice Roberts Et. Al (Harvard) STM 12/10/2013 7 / 20

  14. In Practice ‘Vanilla” LDA with post-hoc comparison Roberts Et. Al (Harvard) STM 12/10/2013 7 / 20

  15. In Practice ‘Vanilla” LDA with post-hoc comparison The exchangeability paradox. Roberts Et. Al (Harvard) STM 12/10/2013 7 / 20

  16. In Practice ‘Vanilla” LDA with post-hoc comparison The exchangeability paradox. Custom Models vs. Off the Shelf Roberts Et. Al (Harvard) STM 12/10/2013 7 / 20

  17. Our Approach General framework for including covariates Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20

  18. Our Approach General framework for including covariates General framework for including covariates Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20

  19. Our Approach General framework for including covariates General framework for including covariates Two types of covariates: Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20

  20. Our Approach General framework for including covariates General framework for including covariates Two types of covariates: ◮ Topical Prevalence: Logistic Normal GLM Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20

  21. Our Approach General framework for including covariates General framework for including covariates Two types of covariates: ◮ Topical Prevalence: Logistic Normal GLM ◮ Topical Content: Multinomial Logit on Words Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20

  22. Our Approach General framework for including covariates General framework for including covariates Two types of covariates: ◮ Topical Prevalence: Logistic Normal GLM ◮ Topical Content: Multinomial Logit on Words Builds off: DMR (Mimno and McCallum 2008), SAGE (Eisenstein et al 2011) and the CTM (Blei and Lafferty 2007) Roberts Et. Al (Harvard) STM 12/10/2013 8 / 20

  23. Latent Dirichlet Allocation Figure: Plate Notation of Latent Dirichlet Allocation Graphic from David Blei’s Website: http://www.cs.princeton.edu/ blei/modeling-science.pdf Roberts Et. Al (Harvard) STM 12/10/2013 9 / 20

  24. Structural Topic Model X γ Topic Prevalence: µ µ d,k = X d γ k N (0 , σ 2 ∼ k ) Σ γ k σ 2 Gamma( s γ , r γ ) ∼ k θ Language Model: θ d ∼ LogisticNormal( µ d , Σ) z z d,n ∼ Mult( θ d ) w d,n ∼ Mult( β k = z d,n ) d w N Topical Content: β k d,v / exp( m v + κ .,k + κ y,. v + κ y,k v ) v β κ y,k ∼ Laplace(0 , τ y,k ) v v τ y,k ∼ Gamma( s κ , r κ ) κ v K Y D Roberts Et. Al (Harvard) STM 12/10/2013 10 / 20

  25. A Tale of Two Covariates Prevalence Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

  26. A Tale of Two Covariates Prevalence ◮ Prior on the mixture over topics is now document-specific Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

  27. A Tale of Two Covariates Prevalence ◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N ( X γ, Σ) Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

  28. A Tale of Two Covariates Prevalence ◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N ( X γ, Σ) ◮ Documents which have similar covariates will tend to talk about the same topics. Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

  29. A Tale of Two Covariates Prevalence ◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N ( X γ, Σ) ◮ Documents which have similar covariates will tend to talk about the same topics. Content Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

  30. A Tale of Two Covariates Prevalence ◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N ( X γ, Σ) ◮ Documents which have similar covariates will tend to talk about the same topics. Content ◮ Distribution over words is now document-specific Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

  31. A Tale of Two Covariates Prevalence ◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N ( X γ, Σ) ◮ Documents which have similar covariates will tend to talk about the same topics. Content ◮ Distribution over words is now document-specific ◮ Topics are sparse deviations from a word-specific baseline β k , g ∝ exp ( m + κ ( k ) + κ ( g ) + κ ( k , g ) ) Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

  32. A Tale of Two Covariates Prevalence ◮ Prior on the mixture over topics is now document-specific ◮ η ∼ N ( X γ, Σ) ◮ Documents which have similar covariates will tend to talk about the same topics. Content ◮ Distribution over words is now document-specific ◮ Topics are sparse deviations from a word-specific baseline β k , g ∝ exp ( m + κ ( k ) + κ ( g ) + κ ( k , g ) ) ◮ Documents which have similar covariates will tend to talk about topics in the same way. Roberts Et. Al (Harvard) STM 12/10/2013 11 / 20

Recommend


More recommend