Machine Learning 2 DS 4420 - Spring 2020 Topic Modeling 2 Byron C. Wallace
Last time: Topic Modeling!
Word Mixtures Idea: Model text as a mixture over words (ignore order) gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, Words: Topics:
Topic Modeling Topics Words in Document Topic Proportions (shared) (mixture over topics) (document-specific) assignments gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, Idea: Model corpus of documents with shared topics
Topic Modeling Topics Words in Document Topic Proportions (shared) (mixture over topics) (document-specific) assignments gene 0.04 dna 0.02 genetic 0.01 .,, life 0.02 evolve 0.01 organism 0.01 .,, brain 0.04 neuron 0.02 nerve 0.01 ... data 0.02 number 0.02 computer 0.01 .,, • Each topic is a distribution over words • Each document is a mixture over topics • Each word is drawn from one topic distribution
EM for Word Mixtures (PLSA) Generative Model E-step: Update assignments M-step: Update parameters
EM for Word Mixtures (PLSA) Generative Model E-step: Update assignments M-step: Update parameters
EM for Word Mixtures (PLSA) Generative Model E-step: Update assignments M-step: Update parameters
Today: A Bayesian view — topic modeling with priors (or, LDA)
Latent Dirichlet Allocation (a.k.a. PLSI/PLSA with priors) Per-word Proportions topic assignment parameter Per-document Observed Topic Topics topic proportions parameter word α θ d Z d,n W d,n β k η N D K
Dirichlet Distribution
Dirichlet Distribution Common choice in LDA: α k = 0.001
Estimation via sampling (board)
Extensions of LDA • EM inference (PLSA/PLSI) yields similar results to Variational inference or MAP inference (LDA) on most data • Reason for popularity of LDA: can be embedded in more complicated models
Extensions of LDA • EM inference (PLSA/PLSI) yields similar results to Variational inference or MAP inference (LDA) on most data • Reason for popularity of LDA: can be embedded in more complicated models
Extensions: Supervised LDA α θ d Z d,n W d,n β k N K η , σ 2 Y d D 1 Draw topic proportions θ | α ∼ Dir ( α ) . 2 For each word • Draw topic assignment z n | θ ∼ Mult ( θ ) . • Draw word w n | z n , β 1 : K ∼ Mult ( β z n ) . 3 Draw response variable y | z 1 : N , η , σ 2 ∼ N z , σ 2 � η > ¯ � , where z = ( 1 / N ) P N ¯ n = 1 z n .
Extensions: Supervised LDA α θ d Z d,n W d,n β k N K η , σ 2 Y d D 1 Draw topic proportions θ | α ∼ Dir ( α ) . 2 For each word • Draw topic assignment z n | θ ∼ Mult ( θ ) . • Draw word w n | z n , β 1 : K ∼ Mult ( β z n ) . 3 Draw response variable y | z 1 : N , η , σ 2 ∼ N z , σ 2 � η > ¯ � , where z = ( 1 / N ) P N ¯ n = 1 z n .
Extensions: Supervised LDA α θ d Z d,n W d,n β k N K η , σ 2 Y d D 1 Draw topic proportions θ | α ∼ Dir ( α ) . 2 For each word • Draw topic assignment z n | θ ∼ Mult ( θ ) . • Draw word w n | z n , β 1 : K ∼ Mult ( β z n ) . 3 Draw response variable y | z 1 : N , η , σ 2 ∼ N z , σ 2 � η > ¯ � , where z = ( 1 / N ) P N ¯ n = 1 z n .
Extensions: Supervised LDA α θ d Z d,n W d,n β k N K η , σ 2 Y d D 1 Draw topic proportions θ | α ∼ Dir ( α ) . 2 For each word • Draw topic assignment z n | θ ∼ Mult ( θ ) . • Draw word w n | z n , β 1 : K ∼ Mult ( β z n ) . 3 Draw response variable y | z 1 : N , η , σ 2 ∼ N z , σ 2 � η > ¯ � , where z = ( 1 / N ) P N ¯ n = 1 z n .
Extensions: Supervised LDA least bad more awful his both problem guys has featuring their motion unfortunately watchable than routine character simple supposed its films dry many perfect worse not director offered while fascinating flat one will charlie performance power dull movie characters paris between complex ● ● ● ● ● ● ● ● ● ● − 30 − 20 − 10 have not 0 one however 10 20 like about from cinematography you movie there screenplay was all which performances just would who pictures some they much effective out its what picture
Extensions: Analyzing RateMDs ratings via “Factorial LDA”
Factors
Factorial LDA • We use f-LDA to model topic and sentiment • Each (topic,sentiment) pair has a word distribution • e.g. (Systems/Staff, Negative): office time doctor appointment rude staff room didn’t visit wait
Factorial LDA • We use f-LDA to model topic and sentiment • Each (topic,sentiment) pair has a word distribution • e.g. (Systems/Staff, Positive): dr time staff great helpful feel questions office really friendly
Factorial LDA • We use f-LDA to model topic and sentiment • Each (topic,sentiment) pair has a word distribution • e.g. (Interpersonal, Positive): dr doctor best years caring care patients patient recommend family
• Why should the word distributions for pairs make any sense? • Parameters are tied across the priors of each word distribution – The prior for (Systems,Negative) shares parameters with (Systems,Positive) which shares parameters with the prior for (Interpersonal,Positive)
Systems Positive staff recommend dr time wonderful time office highly staff questions knowledgeable great wait professional helpful helpful kind feel nice great doctor feel dr questions great best office appointment helpful friendly nurse amazing really
Systems Positive dr dr time time multinomial parameters staff staff sampled from Dirichlet great great helpful helpful feel feel questions doctor office questions really office friendly friendly doctor really
Extensions: Correlated Topic Model β k Σ η d Z d,n W d,n N D K µ Noconjugate prior on topic proportions Estimate a covariance matrix Σ that parameterizes correlations between topics in a document
Extensions: Dynamic Topic Models 1789 2009 Inaugural addresses My fellow citizens: I stand here today humbled by the task AMONG the vicissitudes incident to life no event could before us, grateful for the trust you have bestowed, mindful have filled me with greater anxieties than that of which of the sacrifices borne by our ancestors... the notification was transmitted by your order... Track changes in word distributions associated with a topic over time.
Extensions: Dynamic Topic Models α α α θ d θ d θ d Z d,n Z d,n Z d,n W d,n W d,n W d,n N N N D D D . . . β k, 2 β k,T β k, 1 K
Extensions: Dynamic Topic Models 1930 1940 1880 1890 1900 1910 1920 tube air electric electric apparatus air apparatus apparatus tube machine power steam water tube glass apparatus power company power engineering air air glass engine steam engine apparatus pressure mercury laboratory steam electrical engineering room water laboratory rubber two machine water laboratory glass pressure pressure machines two construction engineer gas made small iron system engineer made made gas mercury battery motor room gas laboratory small gas wire engine feet tube mercury 1950 1960 1970 1980 1990 2000 tube tube air high materials devices apparatus system heat power high device glass temperature power design power materials air air system heat current current chamber heat temperature system applications gate instrument chamber chamber systems technology high small power high devices devices light laboratory high flow instruments design silicon pressure instrument tube control device material rubber control design large heat technology
Summing up • Latent Dirichlet Allocation (LDA) is a Bayesian topic model that is readily extensible • To estimate parameters, we used a sampling based approach. General idea: draw samples of parameters and keep those that make the observed data likely • Gibbs sampling is a particular variant of this approach, and draws individual parameters conditioned on all others
Recommend
More recommend