Who, What, When, Where, and Why? A Computational Approach to Understanding Historical Events Using State Department Cables Allison J.B. Chaney Princeton University Hanna Wallach David M. Blei Microsoft Research Columbia University Matthew Connelly History Lab at Columbia
We can do nothing but scrutinize historical events themselves if we want to discover what they are. – Dean W.R. Matthews, What is an Historical Event?
When? } Who? What? observable ways to characterize Where? unobservable events
Why? } Who? What? observable ways to characterize Where? unobservable events When?
data • communications between the U.S. State Department and its embassies (“cables”) • around two million cables • sent between 1973 and 1977
BANGKOK CANBERRA HONG KONG MANILA STATE SEOUL SINGAPORE TOKYO PHNOM PENH SAIGON TAIPEI PEKING VIENTIANE
key actors cables entities events
representing cables
representing cables
representing cables
representing cables Latent Dirichlet allocation. Blei, Ng, and Jordan, 2003.
documents sent SAIGON
documents sent SAIGON typical concerns
documents sent SAIGON θ 1 θ 2 typical concerns θ 3 … θ d
documents sent SAIGON θ 1 θ 2 typical concerns φ 0 k ∼ Gamma( α φ , µ φ / α φ ) θ 3 … θ d
documents sent SAIGON typical concerns events 1973 1977
documents sent SAIGON typical concerns events 1973 1977
modeling events 1973 1977
modeling events 1973 1977 WHEN? ✏ i ∼ Poisson( ⌘ ✏ )
modeling events 1973 1977 WHEN? ✏ i ∼ Poisson( ⌘ ✏ ) WHAT? π ik ∼ Gamma( α π , µ π / α π )
modeling events 1973 1977 WHEN? ✏ i ∼ Poisson( ⌘ ✏ ) WHAT? π ik ∼ Gamma( α π , µ π / α π )
modeling cables 1973 1977 typical concerns sum over all events X event description φ jk = φ 0 k + f ( a i , c j ) π i i decay of relevancy
modeling cables 1973 1977 typical concerns sum over all events X event description φ jk = φ 0 k + f ( a i , c j ) π i i θ jk ∼ Gamma( α θ , φ jk / α θ ) decay of relevancy
modeling cables ? ? 1973 1977 typical concerns sum over all events ? X event description φ jk = φ 0 k + f ( a i , c j ) π i i θ jk ∼ Gamma( α θ , φ jk / α θ ) decay of relevancy
How do we find the values of the hidden parameters that best fit the data? observed data learned parameters black box entity typical concerns variational inference event occurence event content model assumptions exploration Black box variational inference. Ranganath , Gerrish, and Blei, 2014.
validation • compare discovered events to manually collected examples of known historical events (and corresponding cables) • How many of the known events are recovered? • How does the average topic distribution of the known cables compares to the discovered event distribution? • present the discovered events (date, topic distribution, and entities involved) to an expert historian
exploration: Saigon φ 0
exploration: Saigon π i
exploration: Saigon π i
results summary • topic models can describe documents, but they cannot identify when events occur • we explicitly models event occurrences and event content • our model can be used to identify and explore events
future work • Main next step: share events between entities • Other areas: • include interactions between entities • learn event duration • explore different event decay shapes • thorough model validation
Thank you! Questions and suggestions welcome.
Recommend
More recommend