Latent Dimensions of Religion and Spirituality: A Longitudinal Correlated Topic Model Seong-Hyeon (Sung) Kim 1 , Nathaniel R. Strenger 2 , & Narae Lee 1 1 Fuller Graduate School of Psychology, Pasadena, California, USA 2 Pastoral Counseling Center, Dallas, Texas, USA
Overview • Religion & Spirituality (R/S) • Research Questions • Topic models • Automated text analysis • Topics: Latent dimensions of text • Topic proportions as compositional data • Ternary diagrams • Topic correlations
Religion & Spirituality (R/S) • Definitions • Religion : “the search for significance that occurs within the context of established institutions that are designed to facilitate spirituality” (Pargament et al., 2013, p. 15). • Spirituality : “the search for the sacred” (Pargament et al., 2013, p. 14). Pargament, K. I., Mahoney, A., Exline, J. J., Jones, J. W., & Shafranske, E. P. (2013). Envisioning an integrative paradigm for the psychology of religion and spirituality. In K. I. Pargament, J. J. Exline, & J. W. Jones (Eds.), APA handbook of psychology, religion, and spirituality (Vol 1): Context, theory, and research (pp. 3 – 19). Washington, DC: American Psychological Association. https://doi.org/10.1037/14045-001
Religion & Spirituality (R/S) • Gorsuch (1984) introduced factor analysis as a tool to investigate the dimension of R/S. • He had criticized the over-supply of R/S measures. • Our research introduces topic modeling as a tool to identify the fundamental dimensions or building blocks of R/S that had been conceptualized in the R/S measures. Gorsuch, R. L. (1984). Measurement: The boon and bane of investigating religion. American Psychologist , 39 (3), 228 – 236. https://doi.org/10.1037/0003-066X.39.3.228
Automated Text Analysis • Quantitative (NOT qualitative) text analysis • Three Different Types 1. Dictionary method : Pre-defined set of categories 2. Supervised learning : Outcome categories known (e.g., spam mail sorting) 3. Unsupervised learning : e.g., topic modeling (outcome categories unknown)
Topic Modeling • Identify topics, the latent dimensions, in the text data • Machine (statistical) learning + computer science + statistics • Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003): Basic and popular, but does not allow topic correlations Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research , 3 (Jan), 993 – 1022.
TASA Corpus: 37,000 Texts & 300 Topics Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis , 427 (7), 424-440.
Example: Steyvers & Griffiths (2007) • 2 topics • Each gives approximately equal probability to • Topic 1: “money,” “loan,” and “bank” • Topic 2: “river,” “stream,” and “bank” • 16 documents were created by arbitrarily mixing the two topics • Let’s analyze this collection of documents with LDA (Blei et al., 2003) Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. In T. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of Latent Semantic . Analysis (pp.424-440). Hillsdale, NJ: Erlbaum.
Steyvers & Griffiths (2007)
Example: 16 Documents
Term Distributions for Topics Topic 1 Topic 2 Word Probability Word Probability bank .390 stream .391 money .314 bank .345 loan .287 river .240 river .009 money .012 stream .000 loan .012
Topic Distribution for Documents
Matrix Factorization Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis , 427 (7), 424-440.
LDA & Beyond • Limitations of LDA • Fails to model correlation between topics • Stems from the implicit independence assumption in the Dirichlet distribution on the topic proportions in documents • Topics are usually correlated in texts.
LDA & Beyond • Correlated Topic Model (CTM, Blei & Lafferty, 2007) • Replaces the Dirichlet in LDA with “more flexible logistic normal distribution” (p. 19). • This paper cites Aitchison & Shen (1980), Aitchison (1982), & Aitchison (1985). Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics , 1 (1), 17 – 35. https://doi.org/10.1214/07-AOAS114 Aitchison, J. (1982). The statistical analysis of compositional data. Journal of the Royal Statistical Society. Series B (Methodological), 44 (2), 139 – 177. Aitchison, J. (1985). A general class of distributions on the simplex. Journal of the Royal Statistical Society. Series B (Methodological), 47 (1), 136-146. Atchison, J., & Shen, S. M. (1980). Logistic-normal distributions: Some properties and uses. Biometrika , 67(2), 261-272.
Structural Topic Model (STM) • Our research used STM based on CTM • Allows topic correlations • Allows covariates (i.e., predictors of topic proportions) • We collected 255 R/S measures published from 1929 and 2016 to identify the latent dimensions of text.
Atkins, D. C., Rubin, T. N., Steyvers, M., Doeden, M. A., Baucom, B. R., & Christensen, A. (2012). Topic Models: A Novel Method for Modeling Couple and Family Text Data. Journal of Family Psychology , 26 , 816-27. doi: 10.1037/a0029607
Preprocessing • R ‘tm’ package ( Feinerer & Hornik, 2017) • Items of 255 R/S measures • Preprocessed texts • Removed stop words, numbers, and punctuations. • e.g., a / an , the , to , for , at , she / he , I , ., or ?. • Lemmatized words • e.g., educate , educated , or educating educate Feinerer, I. & Hornik, K. (2015). tm: Text Mining Package (Version 0.6-2) [Computer software]. Retrieved from https://CRAN.R-project.org/package=tm.
Preprocessing • Created a document-term matrix • Dimensions: 255 × 5617 • Included • unigrams • bigrams (e.g., Jesus Christ ) • trigrams (e.g., religious (and/or) spiritual belief ) • Deleted low-frequency terms (< 3)
Model Estimation • R ‘ stm ’ package (Roberts, Stewart, & Tingley, 2017) • Topics • Latent dimensions of text data • Comparable to principal components or factors • Estimated based on word co-occurrences across documents • Structural topic modeling • Estimate covariates’ effect on topic proportions • Current analysis: Decade of publication as a predictor 1950 ’s through 2010’s Roberts, M. E., Stewart, B. M., & Tingley, D. (2016). stm: R Package for Structural Topic Models (Version 1.1.3) [Computer software]. Retrieved from http://www.structuraltopicmodel.com
Top 50 Frequent Terms
Diagnostic Indexes
3 Topics Identified • Topic 1: Spirituality spirituality , spiritual belief , religious spiritual , wilderness , never experience , spiritual experience , connect , illness , transcendent , transcendent spiritual • Topic 2: Religion church member , loving , teaching church , dealing , dealing life , local religious, join, local religious group, question meaning life, religious denomination • Topic 3: Judeo-Christianity christian, allah, miracle, god will, god god, punish, client, god feel, patient, writing
Longitudinal Change of Expected Topic Proportions from 1950’s to 2010’s The estimated regression lines and their 95% confidence intervals are plotted.
Created using R ‘compositions’ package (van der Boogaart, Tolosana, & Bren, 2015) Van den Boogaart, K. G., Tolosana, R. & Bren, M. (2015). compositions: R Package for Compositional Data Analysis (Version 1.40-1) [Computer software]. Retrieved from https://cran.r-project.org/web/packages/compositions/index.html
Normal Distribution on the Simplex
Topic Correlations 1. exp(-var(z)): Buccianti & Pawlowsky-Glahn (2005) • Z = ilr transformed parts • 0 (1) → low (high) variability of ratios between parts • e.g., .0016 for Topics 1 and 2 2. exp(- τ 2 /2): van den Boogaart & Tolosano-Delgado (2013) • τ : Variation • Interpret this as a correlation coefficient • Very small between topics Buccianti, A., & Pawlowsky-Glahn, V. (2005). New perspectives on water chemistry and compositional data analysis. Mathematical Geology , 37 (7), 703-727. Van den Boogaart, K. G., & Tolosana-Delgado, R. (2013). Analyzing compositional data with R (Vol. 122). Heidelberg: Springer.
THANK YOU
Recommend
More recommend