Nonparametric Bayesian Storyline Detection from Microtexts Vinodh Krishnan and Jacob Eisenstein Georgia Institute of Technology
Clustering microtexts into storylines Strong start for Barcelona Dog tuxedo bought with county credit card Messi scores! Barcelona up 1-0 . . . Yellow card for Messi Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Clustering microtexts into storylines z = 1 Strong start for Barcelona z = 2 Dog tuxedo bought with county credit card z = 1 Messi scores! Barcelona up 1-0 . . . z = 1 Yellow card for Messi Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Clustering microtexts into storylines z = 1 Oct 1, 1:15pm Strong start for Barcelona z = 2 Dog tuxedo bought with county Oct 1, 1:23pm credit card z = 1 Oct 1, 1:39pm Messi scores! Barcelona up 1-0 . . . z = 3 Yellow card for Messi Oct 8, 10:15am Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Clustering microtexts into storylines z = 1 Oct 1, 1:15pm Strong start for Barcelona z = 2 Dog tuxedo bought with county Oct 1, 1:23pm credit card z = 1 Oct 1, 1:39pm Messi scores! Barcelona up 1-0 . . . z = 3 Yellow card for Messi Oct 8, 10:15am Storyline detection is a multimodal clustering problem, involving content and time . Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
About time Prior approaches to modeling time ◮ Maximum temporal gap between items on same storyline ◮ Look for attention peaks (Marcus et al., 2011) ◮ Model temporal distribution per storyline (Ihler et al., 2006; Wang & McCallum, 2006) Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
About time Prior approaches to modeling time ◮ Maximum temporal gap between items on same storyline ◮ Look for attention peaks (Marcus et al., 2011) ◮ Model temporal distribution per storyline (Ihler et al., 2006; Wang & McCallum, 2006) Problems with these approaches: ◮ Storylines can have vastly different timescales, might be periodic, etc. ◮ Methods for determining number of storylines are typically ad hoc. Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
This work A non-parametric Bayesian framework for storylines ◮ The number of storylines is a latent variable. ◮ No parametric assumptions about the temporal structure of storyline popularity. ◮ Text is modeled as a bag-of-words, but the modular framework admits arbitrary (centroid-based) models. ◮ Linear-time inference via streaming sampling Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Modeling framework Prior probability of storyline assignments, conditioned on timestamps K P ( w , z | t ) = P ( z | t ) � P ( { w i : z i = k } ) k =1 Likelihood of text, computed per storyline Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
The prior over storyline assignments We want a prior distribution P ( z | t ) that is: ◮ nonparametric over the number of storylines; ◮ nonparametric over the storyline temporal distributions. How to do it? Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
The prior over storyline assignments We want a prior distribution P ( z | t ) that is: ◮ nonparametric over the number of storylines; ◮ nonparametric over the storyline temporal distributions. How to do it? The distance-dependent Chinese restaurant process (Blei & Frazier, 2011) Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
From graphs to clusterings Key idea of dd-CRP: “follower” c 1 graphs define clusterings. c 2 ◮ Z = ((1 , 3 , 4) , (2)) c 3 c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
From graphs to clusterings Key idea of dd-CRP: “follower” c 1 graphs define clusterings. c 2 ◮ Z = ((1 , 3 , 4) , (2)) ◮ Z = ((1 , 3) , (2 , 4)) c 3 c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
From graphs to clusterings Key idea of dd-CRP: “follower” c 1 graphs define clusterings. c 2 ◮ Z = ((1 , 3 , 4) , (2)) ◮ Z = ((1 , 3) , (2 , 4)) c 3 ◮ Z = ((1 , 3 , 4) , (2)) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
From graphs to clusterings Key idea of dd-CRP: “follower” c 1 graphs define clusterings. c 2 ◮ Z = ((1 , 3 , 4) , (2)) ◮ Z = ((1 , 3) , (2 , 4)) c 3 ◮ Z = ((1 , 3 , 4) , (2)) ◮ Z = ((1 , 3) , (2) , (4)) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Prior distribution We reformulate the prior over follower graphs: N � P ( z | t ) = P ( c | t ) = P ( c i | t i , t c i ) i =1 e −| t i − t ci | / a , c i � = i P ( c i | t i , t c i ) = c i = i α, ◮ Probability of two documents being linked decreases exponentially with time gap t i − t j . ◮ The likelihood of a document linking to itself (starting a new cluster) is proportional to α . Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Modeling framework Prior probability of storyline assignments, conditioned on timestamps K P ( w , z | t ) = P ( z | t ) � P ( { w i : z i = k } ) k =1 Likelihood of text, computed per storyline Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Likelihood Cluster likelihoods are computed using the Dirichlet Compound Multinomial (Doyle & Elkan, 2009) . K P ( w ) = � P ( { w i } z i = k ) k =1 K � = � θ P MN ( { w i } z i = k | θ k ) P Dir ( θ k ; η ) d θ k k =1 K = � P DCM ( { w i } z i = k ; η ) , k =1 where η is a concentration hyperparameter. Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
The Dirichlet Compound Multinomial The DCM is a distribution over vectors of counts, which rewards compact word distributions. 5 10 w1 w2 20 4 30 3 logP ( w ) 40 2 50 w 1 60 1 w 2 70 10 -5 10 -4 10 -1 10 1 0 10 -3 10 -2 10 0 10 2 Messi card Barcelona yellow credit tuxedo goal η word We set the hyperparameter η using a heuristic from Minka (2012). Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Modeling framework Prior probability of storyline assignments, conditioned on timestamps K P ( w , z | t ) = P ( z | t ) � P ( { w i : z i = k } ) k =1 Likelihood of text, computed per storyline Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Inference: Gibbs sampling c 1 ◮ We iteratively cut and resample each link. ◮ Each link is sampled from the c 2 joint probability, sample ( c i = j | c − i , w ) ∝ Pr( c i = j ) × P ( w | c ) Pr c 3 P ( { w k } z k = z i ∨ z k = z j ) ∝ Pr( c i = j ) × P ( { w k } z k = z i ) × P ( { w k } z k = z j ) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Inference: Gibbs sampling c 1 ◮ We iteratively cut and resample each link. ◮ Each link is sampled from the c 2 joint probability, sample ( c i = j | c − i , w ) ∝ Pr( c i = j ) × P ( w | c ) Pr c 3 P ( { w k } z k = z i ∨ z k = z j ) ∝ Pr( c i = j ) × P ( { w k } z k = z i ) × P ( { w k } z k = z j ) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Inference: Gibbs sampling c 1 ◮ We iteratively cut and resample each link. ◮ Each link is sampled from the c 2 joint probability, sample ( c i = j | c − i , w ) ∝ Pr( c i = j ) × P ( w | c ) Pr c 3 P ( { w 1 , w 3 , w 4 ) } ∝ e − t 4 − t 1 × a P ( { w 4 } ) × P ( { w 1 , w 3 } ) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Inference: Gibbs sampling c 1 ◮ We iteratively cut and resample each link. ◮ Each link is sampled from the c 2 joint probability, sample ( c i = j | c − i , w ) ∝ Pr( c i = j ) × P ( w | c ) Pr c 3 P ( { w 2 , w 4 ) } ∝ e − t 4 − t 2 × a P ( { w 4 } ) × P ( { w 2 } ) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Inference: Gibbs sampling c 1 ◮ We iteratively cut and resample each link. ◮ Each link is sampled from the c 2 joint probability, sample ( c i = j | c − i , w ) ∝ Pr( c i = j ) × P ( w | c ) Pr c 3 P ( { w 1 , w 3 , w 4 ) } ∝ e − t 4 − t 3 × a P ( { w 4 } ) × P ( { w 1 , w 3 } ) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Inference: Gibbs sampling c 1 ◮ We iteratively cut and resample each link. ◮ Each link is sampled from the c 2 joint probability, sample ( c i = j | c − i , w ) ∝ Pr( c i = j ) × P ( w | c ) Pr c 3 ∝ α × P ( { w 4 ) } P ( { w 4 } ) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts
Recommend
More recommend