Human Social Dynamics vs. The Data We Can Get Aaron Clauset Assistant Professor, Computer Science and BioFrontiers Institute, University of Colorado Boulder External Faculty, Santa Fe Institute 4 June 2013 NetSci 2013 Social Dynamics Workshop 1
data sources for social network dynamics • Twitter, Facebook, Google+, Pinterest, etc. • Academic coauthorships & citations • World Wide Web • etc. 2
data sources for social network dynamics • Twitter, Facebook, Google+, Pinterest, etc. • Academic coauthorships & citations • World Wide Web • etc. but... • links often have low or no cost = unrealistic • system structure drives social dynamics • few sources capture “real” social networks (face-to-face time) 3
the more general problem the data we want [detailed, individual traces, over time, about specific social processes] are not the data we can get [detailed, massive electronic traces that may be only vaguely relevant to any real human social processes] 4
an illustration 5
an illustration Halo: Reach (Bungie, 2010) • played online via XBox Live platform • team combat simulation (FPS) • 20TB of game data, spanning • 18 months of time • 17+ million players • 1 billion competitions • 70% are team competitions • complex spatial environments • complex social interactions 6
how it works • join “party” (of 0-3 friends) • choose game type and subtype (“competitive / team 4v4”) • Xbox Live places parties into matches (matchmaking) • play! (for roughly 10 minutes) • repeat (1 billion times) 7
the problem • observed interactions = F(game matchmaking , friendships) • mean interaction degree 330 ≈ • how to distinguish latent friendships from observed interactions ? • plus, no demographic information 8
a small solution • anonymous web survey Mason and Clauset, CSCW 2013 9
a small solution • anonymous web survey • 847 participants • demographic questions age, sex, location, education • psychometric questions attitudes, play style, etc. • friendship survey • 14,405 labeled friends • 7,159,989 labeled non-friends Mason and Clauset, CSCW 2013 10
recovering friendships from interactions we can observe a sequence of pairwise interactions σ ij = ( i, j, t 1 ) , ( i, j, t 2 ) , . . . can we robustly distinguish friendships from non-friendships? 11
recovering friendships from interactions we can observe a sequence of pairwise interactions σ ij = ( i, j, t 1 ) , ( i, j, t 2 ) , . . . can we robustly distinguish friendships from non-friendships? problems : • volume of data varies widely by individual = heavy-tailed distribution in | σ ij | • friendships are sparse in large networks • survey data provide “subjective” truth only 12
what is a friendship? social interactions: friendship = periodic + prosocial interactions diurnal cycle modulates all interactions recovering latent friendship ties supervised learning define 9 statistical features which do well? Merrit, Jacobs, Mason and Clauset, ICWSM 2013 13
accuracy vs. amount of interaction data 9 features of pairwise interactions periodic or prosocial interactions • AUC vs. how much data volume of interactions on a person we have • periodic + prosocial interactions highly robust and efficient • total interaction count also good, eventually Merrit, Jacobs, Mason and Clauset, ICWSM 2013 14
recovering friendships from interactions • friendships easy to recover from interactions • mean degree (interactions) 330 ≈ • mean degree (friendships) 4 ≈ • friendship graph very different from interaction graph • results likely to generalize [see Jones et al. PLoS ONE (2013)] • clarifies “friendship” = periodic + prosocial interactions • privacy concerns Merrit, Jacobs, Mason and Clauset, ICWSM 2013 Jones et al. PLOS ONE 8(1):e52168 (2013). 15
general outlook • electronic data • new window into human social dynamics! • big and detailed! • but, dumb. (not the data we want) • computational social science • know limits of dumb data • keep eye on underlying social processes • be willing to commit a “social science” • a general prescription 1. obtain electronic data (big and dumb) 2. seek out user labeled data (small and smart) 3. model latent variables from observed data (supervised) 4. extract underlying social dynamics 16
thanks Sears Merritt Abigail Z Jacobs Winter Mason (Colorado) (Colorado) (Stevens Inst. Tech.) funded in part by references • Mason and Clauset, “Friends FTW! Friendship, Collaboration and Competition in Halo: Reach .” CSCW (2013) • Merritt, Jacobs, Mason and Clauset, “Detecting friendship within dynamic online interaction networks.” ICWSM (2013). • Merrit and Clauset, “Environmental structure and competitive scoring advantages in team competitions.” arXiv:1304.1039 (2013) 17
fin 18
Recommend
More recommend