Correlated T opic Models Authors: Blei and LaffertY, 2006 - PowerPoint PPT Presentation

Correlated T opic Models Authors: Blei and LaffertY, 2006 Reviewer: Casey Hanson

Recap Latent Dirichlet Allocation • 𝐸 ≡ set of documents . • 𝐿 = set of topics . • 𝑊 = set of all words. |𝑂| words in each doc. • 𝜄 𝑒 ≡ Multi over topics for a document d ∈ 𝐸. 𝜄 𝑒 ~ 𝐸𝑗𝑠(𝛽) • 𝛾 𝑙 ≡ Multi over words in a topic, 𝑙 ∈ 𝐿. 𝛾 𝑙 ~𝐸𝑗𝑠(𝜃) • 𝑎 𝑒,𝑜 ≡ topic selected for word 𝑜 in document 𝑒. 𝑎 𝑒,𝑜 ~Multi(𝜄 𝑒 ) • 𝑋 𝑒,𝑜 ≡ 𝑜 𝑢ℎ word in document 𝑒. 𝑋 𝑒,𝑜 ~ Multi(𝐶 𝑎 𝑒,𝑜 )

Latent Dirichlet Allocation • Need to calculate posterior: 𝑄(𝜄 1:𝐸 , 𝑎 1:𝐸,1:𝑂 , 𝛾 1:𝐿 |𝑋 1:𝐸,1:𝑂 , 𝛽, 𝜃) • ∝ 𝑞(𝜄 1:𝐸 , 𝑎 1:𝐸,1:𝑂 , 𝛾 1:𝐿 , 𝑋 1:𝐸,1:𝑂 , 𝛽, 𝜃) • Normalization factor, 𝜄 𝑎 𝑞(. . ) , is intractable 𝛾 • Need to use approximate inference. • Gibbs Sampling • Drawback • No intuitive relationship between topics. • Challenge • Develop method similar to LDA with relationships between topics.

Normal or Gaussian Distribution 𝑓 − 𝑦−𝜈 2 1 𝑔 𝑦 = 2𝜏 2 𝜏 2𝜌 • Continuous distribution • Symmetrical and defined for −∞ < 𝑦 < ∞ • P arameters: 𝒪 𝜈, 𝜏 2 • 𝜈 ≡ mean • 𝜏 2 ≡ variance • 𝜏 ≡ standard deviation • Estimation from Data: 𝑌 = 𝑦 1 … 𝑦 𝑜 1 𝑜 • 𝑜 𝑗=1 𝜈 = 𝑦 𝑗 𝜏 2 = 1 𝑜 𝑦 𝑗 − 𝜈 2 • 𝑜 𝑗=1

Multivariate Gaussian Distribution: 𝑙 dimensions 1 𝑓 −1 2 𝒀−𝝂 𝑈 Σ −1 (𝒀−𝝂) 𝑔 𝒀 = 𝑔 𝑦 𝑌 1 … 𝑌 𝑙 = 2𝜌 𝑙/2 det Σ • 𝒀 = 𝑌 1 … 𝑌 𝑙 𝑈 ~𝒪(𝝂, Σ) • 𝝂 ≡ 𝑙 x 1 vector of means for each dimension • 𝚻 ≡ 𝑙 x 𝑙 covariance matrix . Example : 2D Case 𝐹[𝑦 1 ] 𝜈 1 • 𝜈 = 𝐹 𝒀 = 𝐹[𝑦 2 ] = 𝜈 2 𝑦 1 − 𝜈 1 2 𝐹 𝐹 𝑦 1 − 𝜈 1 𝑦 2 − 𝜈 2 • Σ = 𝑦 2 − 𝜈 2 2 𝐹 𝑦 1 − 𝜈 1 𝑦 2 − 𝜈 2 𝐹

2D Multivariate Gaussian: 2 𝜏 𝑌 1 𝜍 𝑌 1 ,𝑌 2 𝜏 𝑌 1 𝜏 𝑌 2 • Σ = 2 𝜍 𝑌 1 ,𝑌 2 𝜏 𝑌 1 𝜏 𝑌 2 𝜏 𝑌 2 • Topic Correlations on Off Diagonal 𝑌 𝑗,1 −𝜈 1 𝑌 𝑗,2 −𝜈 2 𝑜 • 𝜍 𝑌 1 ,𝑌 2 𝜏 𝑌 1 𝜏 𝑌 2 = 𝐹 = 𝑗=1 𝑦 1 − 𝜈 1 𝑦 2 − 𝜈 2 𝑜 • Covariance matrix is diagonal!

Matlab Demo

…Back to Topic Models • How can we adapt LDA to have correlations between topics. • In LDA, we assume two things: • Assumption 1: Topics in a document are independent. 𝜄 𝑒 ~𝐸𝑗𝑠(𝛽) • Assumption 2: Distribution of words in a topic is stationary. 𝐶 𝑙 ~(𝜃) • To sample topic distributions for topics that are correlated, we need to correct assumption 1.

Exponential Family of Distributions • Fa mily of distributions that can be placed in the following form: 𝑔 𝑦 𝜄 = ℎ 𝑦 ⋅ 𝑓 𝜃 𝜄 ⋅𝑈 𝑦 −𝐵 𝜄 • Ex: Binomial distribution : 𝜄 = 𝑞 𝑜 𝑦 𝑞 𝑦 (1 − 𝑞) 𝑜−𝑦 , 𝑦 ∈ 0,1,2, … , 𝑜 𝑔 𝑦|𝜄 = 𝑞 𝑜 • 𝜃(𝜄) = log ℎ 𝑦 = 𝑦 , 𝐵 𝜄 = 𝑜 log 1 − 𝑞 , 𝑈 𝑦 = 𝑦 1−𝑞 𝑞 𝑜 𝑦⋅log 1−𝑞 +𝑜⋅log 1−𝑞 𝑔 𝑦 = 𝑦 𝑓 Natural Parameterization

Categorical Distribution • Multinomial n=1: • 𝑔 𝑦 1 = 𝜄 1 ; 𝑔 𝑎 1 = 𝜄 𝑈 ⋅ 𝑎 1 • where 𝑎 1 = 1 0 0. . 0 𝑈 ( Iverson Bracket or Indicator Vector) • 𝑨 𝑗 = 1 • P arameters: 𝜄 • 𝜄 = 𝑞 1 𝑞 2 𝑞 3 , where 𝑗 𝑞 𝑗 = 1 • 𝜄 ′ = 𝑞 1 𝑞 2 𝑞 𝑙 1 𝑞 𝑙 𝑞 1 𝑞 2 • log 𝜄 ′ = log 𝑞 𝑙 log 𝑞 𝑙 1

Exponential Family Multinomial With N=1 • 𝑺𝒇𝒅𝒃𝒎𝒎: 𝑔 𝑎 𝑗 𝜄 = 𝜄 𝑈 ⋅ 𝑎 𝑗 • We want: 𝑔 𝑦 𝜄 = ℎ 𝑦 ⋅ 𝑓 𝜃 𝜄 ⋅𝑈 𝑦 −𝐵 𝜄 𝑓 𝜃𝑈⋅𝑎𝑗 • 𝑔 𝑎 𝑗 𝜃 = 𝑓 𝜃 𝑈 𝑎 𝑗 −log 𝑗=1 𝑓 𝜃𝑗 = 𝑗=1 𝑓 𝜃𝑗 • Note: k-1 independent dimensions in Multinomial 𝑞 1 𝑞 2 𝑞 𝑗 • 𝜃′ = [log 𝑞 𝑙 log 𝑞 𝑙 … .0] , 𝜃′ 𝑗 = log 𝑞 𝑙 𝑓 𝜃′𝑈⋅𝑎𝑗 • 𝑔 𝑎 𝑗 𝜃 ′ =⋅ 𝜃𝑗′ 𝑙−1 𝑓 𝑗 1+ 𝑗=1

Verify: Classroom participation 𝑞 1 𝑞 2 • Given: 𝜃 = [log 𝑞 𝑙 log 𝑞 𝑙 … 0] • Show: 𝑔 𝑎 𝑗 𝜄 = 𝜄 𝑈 ⋅ 𝑎 𝑗 = 𝑓 𝜃 𝑈 𝑎 𝑗 −log 𝑗=1 𝑓 𝜃 𝑗

Intuition and Demo • Can sample 𝜃 from any number of places. • Choose normal (allows for correlation between topic dimensions) • Get a topic distribution for each document by sampling: 𝜃 ~ 𝒪 𝑙−1 𝜈, 𝜏 • What is the 𝜈 𝑞 𝑗 • E xpected deviation from last topic: log 𝑞 𝑙 • Negative means push density towards last topic ( 𝜃 𝑗 < 0, 𝑞 𝑙 > 𝑞 𝑗 ) • What about the covariance • Shows variability in deviation from last topic between topics. 𝜈 = 0 0 𝑈 , 𝜏 = [1 0; 0 1]

Favoring Topic 3 𝜈 = −0.9, −0.9 , Σ = [1 − 0.9; −0.9 1] 𝜈 = −0.9, −0.9 , Σ = [1 0; 0 1]

Favoring Topic 3: 𝜈 = −0.9, −0.9 , Σ = [1 0.4; 0.4 1]

Exercises

Correlated Topic Model • Algorithm: • ∀𝑒 ∈ 𝐸 • Draw 𝜃 𝑒 | 𝜈, Σ ~ 𝒪(𝜈, Σ) • ∀ 𝑜 ∈ 1 … 𝑂 : • Draw topic assignment • 𝑎 𝑜,𝑒 |𝜃 𝑒 ~ Categorical 𝑔 𝜃 𝑒 • Draw word • 𝑋 𝑒,𝑜 | 𝑎 𝑒,𝑜 , 𝛾 1:𝐿 ~ Categorical 𝛾 𝑎 𝑜 • Parameter Estimation: • Intractable • User variational inference (later)

Evaluation I: CTM on Test Data

Evaluation II: 10-Fold Cross Validation LDA vs CTM • ~1500 documents in corpus. • ~5600 unique words • After pruning • Methodology: • Partition data into 10 sets • 10 fold cross validation • Calculate the log likelihood of a set, given you trained on the previous 9 sets, for both LDA and CTM. CTM shows a much higher log likelihood as the number of • Right(L(CTM) - L(LDA)) topics increases. • Left(L(CTM) – L(LDA))

Evaluation II: Predictive Perplexity • Perplexity measure ≡ expected number of equally likely words • Lower perplexity means higher word resolution. • Suppose you see a percentage of words in a document, how likely is the rest of the words in the document according to your model? • CTM does better with lower #’s of observed words. • Able to infer certain words given topic probabilities.

Conclusions • CTM changes the distribution from which hyper parameters are drawn, from a Dirichlet to a logistic normal function. • Very similar to LDA • Able to model correlations between topics. • For larger topic sizes, CTM performs better than LDA. • With known topics, CTM is able to infer words associations better than LDA.

Correlated T opic Models Authors: Blei and LaffertY, 2006 - PowerPoint PPT Presentation

Correlated T opic Models Authors: Blei and LaffertY, 2006 Reviewer: Casey Hanson Recap Latent Dirichlet Allocation set of documents . = set of topics . = set of all words. || words in each doc.

What is OPIC? The Legislature created the Office of Public Insurance Counsel (OPIC) in 1991 as

Introduction to OPIC American Chamber of Commerce Ho Chi Minh City September 18, 2018 The U.S.

Correlated-Q Learning and Cyclic Equilibria in Markov games Haoqi Zhang Correlated-Q Learning

Is the Round- -trip Time trip Time Is the Round Correlated with the Number of Correlated with

Statistical Timing Analysis Statistical Timing Analysis g g y y Considering Spatially and

Overview of this module Course 02429 Analysis of correlated data: Mixed Linear Models Random

A whole genome approach for QTL detection using a linear mixed model with correlated marker

The generalized correlated sampling approach: toward an exact calculation of energy derivatives

CS286r Presentation James Burns March 7, 2006 Calibrated Learning and Correlated Equi-

The correlated spectral function of the nucleus e97006 collaboration HallC TJNAF Basel

Optical Manipulation of Magnetism in a Correlated Electron System Department of Physics Tohoku

Correlated Component Regression: A Fast Parsimonious Approach for Predicting Outcome Variables

Liouville theory and log-correlated processes Xiangyu Cao (LPTMS, Orsay) Random Geometry &

A Cluster Method for Spectral Properties of Correlated Electrons David Snchal Department

Oversigt Course 02429 Analysis of correlated data: Mixed Linear Models Module 1: Introduction to

Overview of this module Course 02429 Analysis of correlated data: Mixed Linear Models Module 8:

Jane Mackenzie CEO, Owner, Founder MBA, B Pharm, PhD student Introduction Business

Soil Phosphorus and Potassium RNA Chapter 14 1 2 Phosphorus and Plant Growth Adequate

PI3K INHIBITION Gianluca Gaidano, M.D., Ph.D Division of Hematology Department of Translational

(Unit II) Chapter 6: The Structure of DNA Introduc;on to

Good Nutrition = Good Performance Athletes need adequate nutrition to perform academically and

Nut utrition rition and nd Su Supplement lements s Jane Skapinker, R.D. January 13 th Agen

Micronutrition, Nutrient Timing And Supplementation Finishing Touches Of The Nutrition Plan

R ECOMMENDATION ON E SSENTIAL N UTRIENT (IOM) Heaney R AJH 2013 Criterion Definition Recommended

Correlated T opic Models Authors: Blei and LaffertY, 2006 - PowerPoint PPT Presentation

Correlated T opic Models Authors: Blei and LaffertY, 2006 Reviewer: Casey Hanson Recap Latent Dirichlet Allocation set of documents . = set of topics . = set of all words. || words in each doc.

What is OPIC? The Legislature created the Office of Public Insurance Counsel (OPIC) in 1991 as

Introduction to OPIC American Chamber of Commerce Ho Chi Minh City September 18, 2018 The U.S.

Correlated-Q Learning and Cyclic Equilibria in Markov games Haoqi Zhang Correlated-Q Learning

Is the Round- -trip Time trip Time Is the Round Correlated with the Number of Correlated with

Statistical Timing Analysis Statistical Timing Analysis g g y y Considering Spatially and

Overview of this module Course 02429 Analysis of correlated data: Mixed Linear Models Random

A whole genome approach for QTL detection using a linear mixed model with correlated marker

The generalized correlated sampling approach: toward an exact calculation of energy derivatives

CS286r Presentation James Burns March 7, 2006 Calibrated Learning and Correlated Equi-

The correlated spectral function of the nucleus e97006 collaboration HallC TJNAF Basel

Optical Manipulation of Magnetism in a Correlated Electron System Department of Physics Tohoku

Correlated Component Regression: A Fast Parsimonious Approach for Predicting Outcome Variables

Liouville theory and log-correlated processes Xiangyu Cao (LPTMS, Orsay) Random Geometry &amp;

A Cluster Method for Spectral Properties of Correlated Electrons David Snchal Department

Oversigt Course 02429 Analysis of correlated data: Mixed Linear Models Module 1: Introduction to

Overview of this module Course 02429 Analysis of correlated data: Mixed Linear Models Module 8:

Jane Mackenzie CEO, Owner, Founder MBA, B Pharm, PhD student Introduction Business

Soil Phosphorus and Potassium RNA Chapter 14 1 2 Phosphorus and Plant Growth Adequate

PI3K INHIBITION Gianluca Gaidano, M.D., Ph.D Division of Hematology Department of Translational

(Unit II) Chapter 6: The Structure of DNA Introduc;on to

Good Nutrition = Good Performance Athletes need adequate nutrition to perform academically and

Nut utrition rition and nd Su Supplement lements s Jane Skapinker, R.D. January 13 th Agen

Micronutrition, Nutrient Timing And Supplementation Finishing Touches Of The Nutrition Plan

R ECOMMENDATION ON E SSENTIAL N UTRIENT (IOM) Heaney R AJH 2013 Criterion Definition Recommended

Liouville theory and log-correlated processes Xiangyu Cao (LPTMS, Orsay) Random Geometry &