who what when where and why
play

Who, What, When, Where, and Why? A Computational Approach to - PowerPoint PPT Presentation

Who, What, When, Where, and Why? A Computational Approach to Understanding Historical Events Using State Department Cables Allison J.B. Chaney Princeton University Hanna Wallach David M. Blei Microsoft Research Columbia University Matthew


  1. Who, What, When, Where, and Why? A Computational Approach to Understanding Historical Events Using State Department Cables Allison J.B. Chaney Princeton University Hanna Wallach David M. Blei Microsoft Research Columbia University Matthew Connelly History Lab at Columbia

  2. We can do nothing but scrutinize historical events themselves if we want to discover what they are. – Dean W.R. Matthews, What is an Historical Event?

  3. When? } Who? What? observable ways to characterize Where? unobservable events

  4. Why? } Who? What? observable ways to characterize Where? unobservable events When?

  5. data • communications between the U.S. State Department and its embassies (“cables”) • around two million cables • sent between 1973 and 1977

  6. BANGKOK CANBERRA HONG KONG MANILA STATE SEOUL SINGAPORE TOKYO PHNOM PENH SAIGON TAIPEI PEKING VIENTIANE

  7. key actors cables entities events

  8. representing cables

  9. representing cables

  10. representing cables

  11. representing cables Latent Dirichlet allocation. Blei, Ng, and Jordan, 2003.

  12. documents sent SAIGON

  13. documents sent SAIGON typical concerns

  14. documents sent SAIGON θ 1 θ 2 typical concerns θ 3 … θ d

  15. documents sent SAIGON θ 1 θ 2 typical concerns φ 0 k ∼ Gamma( α φ , µ φ / α φ ) θ 3 … θ d

  16. documents sent SAIGON typical concerns events 1973 1977

  17. documents sent SAIGON typical concerns events 1973 1977

  18. modeling events 1973 1977

  19. modeling events 1973 1977 WHEN? ✏ i ∼ Poisson( ⌘ ✏ )

  20. modeling events 1973 1977 WHEN? ✏ i ∼ Poisson( ⌘ ✏ ) WHAT? π ik ∼ Gamma( α π , µ π / α π )

  21. modeling events 1973 1977 WHEN? ✏ i ∼ Poisson( ⌘ ✏ ) WHAT? π ik ∼ Gamma( α π , µ π / α π )

  22. modeling cables 1973 1977 typical concerns sum over all events X event description φ jk = φ 0 k + f ( a i , c j ) π i i decay of relevancy

  23. modeling cables 1973 1977 typical concerns sum over all events X event description φ jk = φ 0 k + f ( a i , c j ) π i i θ jk ∼ Gamma( α θ , φ jk / α θ ) decay of relevancy

  24. modeling cables ? ? 1973 1977 typical concerns sum over all events ? X event description φ jk = φ 0 k + f ( a i , c j ) π i i θ jk ∼ Gamma( α θ , φ jk / α θ ) decay of relevancy

  25. How do we find the values of the hidden parameters that best fit the data? observed data learned parameters black box entity typical concerns variational inference event occurence event content model assumptions exploration Black box variational inference. Ranganath , Gerrish, and Blei, 2014.

  26. validation • compare discovered events to manually collected examples of known historical events (and corresponding cables) • How many of the known events are recovered? • How does the average topic distribution of the known cables compares to the discovered event distribution? • present the discovered events (date, topic distribution, and entities involved) to an expert historian

  27. exploration: Saigon φ 0

  28. exploration: Saigon π i

  29. exploration: Saigon π i

  30. results summary • topic models can describe documents, but they cannot identify when events occur • we explicitly models event occurrences and event content • our model can be used to identify and explore events

  31. future work • Main next step: share events between entities • Other areas: • include interactions between entities • learn event duration • explore different event decay shapes • thorough model validation

  32. Thank you! Questions and suggestions welcome.

Recommend


More recommend