nonparametric bayesian storyline detection from microtexts
play

Nonparametric Bayesian Storyline Detection from Microtexts Vinodh - PowerPoint PPT Presentation

Nonparametric Bayesian Storyline Detection from Microtexts Vinodh Krishnan and Jacob Eisenstein Georgia Institute of Technology Clustering microtexts into storylines Strong start for Barcelona Dog tuxedo bought with county credit card Messi


  1. Nonparametric Bayesian Storyline Detection from Microtexts Vinodh Krishnan and Jacob Eisenstein Georgia Institute of Technology

  2. Clustering microtexts into storylines Strong start for Barcelona Dog tuxedo bought with county credit card Messi scores! Barcelona up 1-0 . . . Yellow card for Messi Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  3. Clustering microtexts into storylines z = 1 Strong start for Barcelona z = 2 Dog tuxedo bought with county credit card z = 1 Messi scores! Barcelona up 1-0 . . . z = 1 Yellow card for Messi Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  4. Clustering microtexts into storylines z = 1 Oct 1, 1:15pm Strong start for Barcelona z = 2 Dog tuxedo bought with county Oct 1, 1:23pm credit card z = 1 Oct 1, 1:39pm Messi scores! Barcelona up 1-0 . . . z = 3 Yellow card for Messi Oct 8, 10:15am Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  5. Clustering microtexts into storylines z = 1 Oct 1, 1:15pm Strong start for Barcelona z = 2 Dog tuxedo bought with county Oct 1, 1:23pm credit card z = 1 Oct 1, 1:39pm Messi scores! Barcelona up 1-0 . . . z = 3 Yellow card for Messi Oct 8, 10:15am Storyline detection is a multimodal clustering problem, involving content and time . Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  6. About time Prior approaches to modeling time ◮ Maximum temporal gap between items on same storyline ◮ Look for attention peaks (Marcus et al., 2011) ◮ Model temporal distribution per storyline (Ihler et al., 2006; Wang & McCallum, 2006) Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  7. About time Prior approaches to modeling time ◮ Maximum temporal gap between items on same storyline ◮ Look for attention peaks (Marcus et al., 2011) ◮ Model temporal distribution per storyline (Ihler et al., 2006; Wang & McCallum, 2006) Problems with these approaches: ◮ Storylines can have vastly different timescales, might be periodic, etc. ◮ Methods for determining number of storylines are typically ad hoc. Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  8. This work A non-parametric Bayesian framework for storylines ◮ The number of storylines is a latent variable. ◮ No parametric assumptions about the temporal structure of storyline popularity. ◮ Text is modeled as a bag-of-words, but the modular framework admits arbitrary (centroid-based) models. ◮ Linear-time inference via streaming sampling Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  9. Modeling framework Prior probability of storyline assignments, conditioned on timestamps K P ( w , z | t ) = P ( z | t ) � P ( { w i : z i = k } ) k =1 Likelihood of text, computed per storyline Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  10. The prior over storyline assignments We want a prior distribution P ( z | t ) that is: ◮ nonparametric over the number of storylines; ◮ nonparametric over the storyline temporal distributions. How to do it? Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  11. The prior over storyline assignments We want a prior distribution P ( z | t ) that is: ◮ nonparametric over the number of storylines; ◮ nonparametric over the storyline temporal distributions. How to do it? The distance-dependent Chinese restaurant process (Blei & Frazier, 2011) Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  12. From graphs to clusterings Key idea of dd-CRP: “follower” c 1 graphs define clusterings. c 2 ◮ Z = ((1 , 3 , 4) , (2)) c 3 c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  13. From graphs to clusterings Key idea of dd-CRP: “follower” c 1 graphs define clusterings. c 2 ◮ Z = ((1 , 3 , 4) , (2)) ◮ Z = ((1 , 3) , (2 , 4)) c 3 c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  14. From graphs to clusterings Key idea of dd-CRP: “follower” c 1 graphs define clusterings. c 2 ◮ Z = ((1 , 3 , 4) , (2)) ◮ Z = ((1 , 3) , (2 , 4)) c 3 ◮ Z = ((1 , 3 , 4) , (2)) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  15. From graphs to clusterings Key idea of dd-CRP: “follower” c 1 graphs define clusterings. c 2 ◮ Z = ((1 , 3 , 4) , (2)) ◮ Z = ((1 , 3) , (2 , 4)) c 3 ◮ Z = ((1 , 3 , 4) , (2)) ◮ Z = ((1 , 3) , (2) , (4)) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  16. Prior distribution We reformulate the prior over follower graphs: N � P ( z | t ) = P ( c | t ) = P ( c i | t i , t c i ) i =1  e −| t i − t ci | / a , c i � = i  P ( c i | t i , t c i ) = c i = i α,  ◮ Probability of two documents being linked decreases exponentially with time gap t i − t j . ◮ The likelihood of a document linking to itself (starting a new cluster) is proportional to α . Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  17. Modeling framework Prior probability of storyline assignments, conditioned on timestamps K P ( w , z | t ) = P ( z | t ) � P ( { w i : z i = k } ) k =1 Likelihood of text, computed per storyline Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  18. Likelihood Cluster likelihoods are computed using the Dirichlet Compound Multinomial (Doyle & Elkan, 2009) . K P ( w ) = � P ( { w i } z i = k ) k =1 K � = � θ P MN ( { w i } z i = k | θ k ) P Dir ( θ k ; η ) d θ k k =1 K = � P DCM ( { w i } z i = k ; η ) , k =1 where η is a concentration hyperparameter. Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  19. The Dirichlet Compound Multinomial The DCM is a distribution over vectors of counts, which rewards compact word distributions. 5 10 w1 w2 20 4 30 3 logP ( w ) 40 2 50 w 1 60 1 w 2 70 10 -5 10 -4 10 -1 10 1 0 10 -3 10 -2 10 0 10 2 Messi card Barcelona yellow credit tuxedo goal η word We set the hyperparameter η using a heuristic from Minka (2012). Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  20. Modeling framework Prior probability of storyline assignments, conditioned on timestamps K P ( w , z | t ) = P ( z | t ) � P ( { w i : z i = k } ) k =1 Likelihood of text, computed per storyline Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  21. Inference: Gibbs sampling c 1 ◮ We iteratively cut and resample each link. ◮ Each link is sampled from the c 2 joint probability, sample ( c i = j | c − i , w ) ∝ Pr( c i = j ) × P ( w | c ) Pr c 3 P ( { w k } z k = z i ∨ z k = z j ) ∝ Pr( c i = j ) × P ( { w k } z k = z i ) × P ( { w k } z k = z j ) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  22. Inference: Gibbs sampling c 1 ◮ We iteratively cut and resample each link. ◮ Each link is sampled from the c 2 joint probability, sample ( c i = j | c − i , w ) ∝ Pr( c i = j ) × P ( w | c ) Pr c 3 P ( { w k } z k = z i ∨ z k = z j ) ∝ Pr( c i = j ) × P ( { w k } z k = z i ) × P ( { w k } z k = z j ) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  23. Inference: Gibbs sampling c 1 ◮ We iteratively cut and resample each link. ◮ Each link is sampled from the c 2 joint probability, sample ( c i = j | c − i , w ) ∝ Pr( c i = j ) × P ( w | c ) Pr c 3 P ( { w 1 , w 3 , w 4 ) } ∝ e − t 4 − t 1 × a P ( { w 4 } ) × P ( { w 1 , w 3 } ) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  24. Inference: Gibbs sampling c 1 ◮ We iteratively cut and resample each link. ◮ Each link is sampled from the c 2 joint probability, sample ( c i = j | c − i , w ) ∝ Pr( c i = j ) × P ( w | c ) Pr c 3 P ( { w 2 , w 4 ) } ∝ e − t 4 − t 2 × a P ( { w 4 } ) × P ( { w 2 } ) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  25. Inference: Gibbs sampling c 1 ◮ We iteratively cut and resample each link. ◮ Each link is sampled from the c 2 joint probability, sample ( c i = j | c − i , w ) ∝ Pr( c i = j ) × P ( w | c ) Pr c 3 P ( { w 1 , w 3 , w 4 ) } ∝ e − t 4 − t 3 × a P ( { w 4 } ) × P ( { w 1 , w 3 } ) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

  26. Inference: Gibbs sampling c 1 ◮ We iteratively cut and resample each link. ◮ Each link is sampled from the c 2 joint probability, sample ( c i = j | c − i , w ) ∝ Pr( c i = j ) × P ( w | c ) Pr c 3 ∝ α × P ( { w 4 ) } P ( { w 4 } ) c 4 Vinodh Krishnan and Jacob Eisenstein: Nonparametric Bayesian Storyline Detection from Microtexts

Recommend


More recommend