FRIENDSHIPS RIVALRIES TRYSTS Chenhao Tan Dallas Card (CMU), Noah Smith (UW) 1
RELATIONS rivals pro-choice pro-life undocumented rivals illegal alien immigrants friends small free market government word machine friends alignment translation Chong and Druckman, 2007; Dawkins 1976; Entman, 1993; Gitlin, 1980; Lakoff, 2014; Milton 1964 2
First quantitative framework to systematically describe relations between ideas Demonstrate effective explorations with this framework on a wide range of datasets rivals undocumented illegal immigrants alien friends free small market government 3
• Topics as ideas Our focus is on relations between ideas. We will use standard approaches Hall et al. 2008 • Topics from latent Dirichlet • Keywords as ideas allocation (Blei et al. 2003) • Keywords (Monroe et al. 2008) Culturomics, Michel et al. 2011 4
QUANTITATIVELY • Given a corpus of documents over time, each document consists of a set of ideas undocumented rivals illegal alien immigrants Cooccurrence Rarely cooccur Pointwise mutual information [Church and Hanks 1990] 5
QUANTITATIVELY • Given a corpus of documents over time, each document consists of a set of ideas – Cooccurrence does not capture which is winning or losing undocumented immigrants frequency Pearson correlation illegal alien 6 time
QUANTITATIVELY • Given a corpus of documents over time, each document consists of a set of ideas Prevalence Cooccurrence & correlation Within- Across- document document 7
RARELY COOCCUR immigrant, undocumented illegal, alien 1980 1990 2000 2010 8
Always cooccur Tryst Friendship Anti-correlated Correlated Head-to-head Arms-race Rarely cooccur 9
Always cooccur Tryst Friendship Friendship Anti-correlated Correlated Head-to-head Arms-race Rarely cooccur 10
LIKELY TO COOCCUR immigrant, undocumented obama, president 1980 1990 2000 2010 11
Always cooccur Tryst Friendship Anti-correlated Correlated Head-to-head Arms-race Rarely cooccur 12
Always cooccur Tryst Friendship Anti-correlated Correlated Arms-race Head-to-head Arms-race Rarely cooccur 13
RARELY COOCCUR immigration, deportation republican, party 1980 1990 2000 2010 14
Always cooccur Tryst Friendship Anti-correlated Correlated Head-to-head Arms-race Rarely cooccur 15
Always cooccur Tryst Tryst Friendship Anti-correlated Correlated Arms-race Head-to-head Arms-race Rarely cooccur 16
LIKELY TO COOCCUR immigration, deportation detainee, detention 1980 1990 2000 2010 17
Always cooccur Tryst Friendship We have shown a framework to quantitatively describe relations between ideas. Anti-correlated Can we use them to effectively explore relations Correlated between ideas? Head-to-head Arms-race Rarely cooccur 18
• Newspapers and research articles as datasets – Immigration – Terrorism – Same-sex marriage – Abortion – Tobacco – ACL – NIPS 19
Correlated, but many pairs in all four quadrants! 0.6 pearsonr = 0.55 cooccurrence 0.4 0.2 0.0 -0.2 -0.4 -0.6 -1.0 -0.5 0.0 0.5 1.0 prevalence correlation 20
Strength = |PMI| × |correlation| 0.6 pearsonr = 0.55 cooccurrence 0.4 0.2 Extreme pairs are 0.0 the interesting ones! -0.2 -0.4 -0.6 -1.0 -0.5 0.0 0.5 1.0 prevalence correlation 21
• Terrorism – Keywords – Topics 22
• Terrorism – Keywords – Topics 23
arab 0.3 islam frequency 0.2 0.1 0 1980 1990 2000 2010 24
• Terrorism – Keywords – Topics 25
The relations between these topics are consistent with structural balance theory: the enemy of an enemy is a friend [Cartwright and Harary, 1956; Heider, 1946] 26
Rank among all relations PMI Correlation Joint Keywords arab islam 106 1,494 2 afghanistan, federal, state 43 99 2 taliban Topics federal, state iran, lybia 36 56 2 The “interesting” pair is ranked much higher according to our framework. 27
Always cooccur machine translation machine translation rule,forest methods word alignment Tryst Friendship 1980 1990 2000 2010 1980 1990 2000 2010 Anti-correlated Correlated machine translation machine translation sentiment analysis discourse (coherence) Head-to-head Arms-race 1980 1990 2000 2010 1980 1990 2000 2010 Rarely cooccur 28
https://github.com/nwrush/Visualization 29
Thank you! A quantitative way to describe cooccurrence relations between ideas: friendships, head-to-head, arms-race, tryst prevalence An effective framework to explore correlation temporal text corpora chenhao@chenhaot.com, Twitter: @ChenhaoTan Data & code: https://chenhaot.com/papers/idea-relations.html 30
Recommend
More recommend