Isaac Waller walleris@cs.toronto.edu Ashton Anderson ashton@cs.toronto.edu University of Toronto The Web Conference 2019 Generalists and Specialists Using Community Embeddings to Quantify Activity Diversity in Online Platforms
full-stack developer vs. React developer family doctor vs. neurosurgeon generalist vs. specialist Generalists and specialists
full-stack developer vs. React developer family doctor vs. neurosurgeon generalist vs. specialist Generalists and specialists
vulture generalist koala specialist Koala photo by DAVID ILIFF. License: CC-BY-SA 3.0. Vulture photo by Charles Sharp. License: CC-BY-SA 4.0 Generalists and specialists
Games MakeupAddiction medicalschool soccer math programming Cartalk chromeos Construction funny television Aquariums Reddit
User 1: User 2: GS C ? Which is the specialist? C = { China , nba , Buddhism , startrek } C = { Fitness , powerlifting , bodybuilding , weightroom }
User 1: User 2: Which is the specialist? C = { China , nba , Buddhism , startrek } C = { Fitness , powerlifting , bodybuilding , weightroom } GS ( C ) = ?
[1] Mikolov et al. (2013) Distributed Representations of Words and Phrases and their Compositionality Word2vec 1
[3] Martin (2017) community2vec: Vector representations of online communities encode semantic relationships Output: a vector for each community in the input, where communities with high [2] Kumar et al. (2018) Community Interaction and Conflict on the Web user overlap are closer to each other Word2vec for communities 2,3 Input: a ( community , user ) pair for each comment made in a community ( Games , user1 ) ( Fitness , user3 ) ( medicalschool , user2 ) ( China , user4 ) ( Science , user2 ) ( weightlifting , user3 )
[3] Martin (2017) community2vec: Vector representations of online communities encode semantic relationships Output: a vector for each community in the input, where communities with high [2] Kumar et al. (2018) Community Interaction and Conflict on the Web user overlap are closer to each other Word2vec for communities 2,3 Input: a ( community , user ) pair for each comment made in a community ( Games , user1 ) ( Fitness , user3 ) ( medicalschool , user2 ) ( China , user4 ) ( Science , user2 ) ( weightlifting , user3 )
A first embedding
A first embedding
Verb tense Male to female Word analogies
Sports team to sport / city University to city Community analogies
toronto AnaheimDucks brocku PolkStateCollege WinterHaven as csun LosAngeles Coyotes phoenix as LosAngeles as FLC folsom as OxfordBrookes oxford phillies philadelphia as Torontobluejays oaklandraiders oakland indianapolis nus stcatharinesON as uakron akron angelsbaseball baseball as nba LAClippers singapore Colts missoula as UMT 4,392 analogies total → → → → → → → → → → → → → → → →
triathalon 72% perfect, 93% top 5 running swimming cycling Hyperparameter search 0.16 0.16 0.18 0.18 alpha alpha 0.20 0.20 0.22 0.22 100 120 140 160 180 200 100 120 140 160 180 200 size size 23.81% 31.53% 39.25% 46.97% 54.69% 62.41% 70.14% 77.86% 85.58% 93.30%
triathalon 72% perfect, 93% top 5 running swimming cycling Hyperparameter search 0.16 0.16 0.18 0.18 alpha alpha 0.20 0.20 0.22 0.22 100 120 140 160 180 200 100 120 140 160 180 200 size size 23.81% 31.53% 39.25% 46.97% 54.69% 62.41% 70.14% 77.86% 85.58% 93.30%
72% perfect, 93% top 5 Hyperparameter search 0.16 0.16 0.18 0.18 alpha alpha 0.20 0.20 0.22 0.22 100 120 140 160 180 200 100 120 140 160 180 200 size size 23.81% 31.53% 39.25% 46.97% 54.69% 62.41% 70.14% 77.86% 85.58% 93.30% cycling + swimming + running = triathalon
Our better embedding
User 1: User 2: Back to generalists and specialists C = { China , nba , Buddhism , startrek } C = { Fitness , powerlifting , bodybuilding , weightroom } GS ( C ) = ?
generalist specialist GS C C c C w c cos c GS-score
generalist specialist GS-score GS ( C ) = 1 ∑ w c cos ( c , µ ) | C | c ∈ C
User 1: User 2: GS-score GS ( { China , nba , Buddhism , startrek } ) = 0 . 69 24 th percentile GS ( { Fitness , powerlifting , bodybuilding , weightroom } ) = 0 . 89 72 nd percentile GS ( C ) = 1 ∑ w c cos ( c , µ ) | C | c ∈ C
All comments in 2017 All commits, pull requests, forks, watches, and stars in 2017 900M comments, 11.4M distinct users 413M actions, 8.3M distinct users Top 10,000 subreddits by activity Top 40,000 repos by number of stars Sources: pushshift.io , gharchive.org Data
Reddit (left) and GitHub (right) Results 75000 10000 3 5 6 11 Frequency 12 31 50000 32 5000 25000 0 0 0.6 0.8 1.0 0.6 0.8 1.0
but generalists stay engaged with the platform longer Specialists stay engaged with communities longer Results 1.0 0.005 P(stay for >= 6 months) 0.8 0.002 0.004 0.6 0.4 0.001 0.003 0.2 0.000 0.0 0.0 0.6 0.2 0.8 1.0 0.4 0.6 0.6 0.8 0.8 1.0 1.0 User's GS-score
but generalists stay engaged with the platform longer Specialists stay engaged with communities longer Results 1.0 P(remaining on platform) 0.90 0.8 0.8 0.85 0.6 0.7 0.80 0.4 1st quartile 0.75 2nd quartile 0.6 0.2 3rd quartile 0.70 4th quartile 0.0 0.0 20 40 0.2 60 80 0.4 20 0.6 40 0.8 60 80 1.0 Activity (# of comments)
but generalists stay engaged with the platform longer Specialists stay engaged with communities longer Results
On Reddit, specialists tend to be make more exceptional comments Results P(score > parent) 0.16 0.14 20 40 60 80 100 Percentile author GS-score
but generalists are exposed to a more diverse set of users Results Parent-universe GS-score 1.0 0.9 0.8 0.7 0.6 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 User's GS-score
Center-of-mass NN Mean average precision 0.06 Collaborative filtering Popularity 0.04 Random 0.02 0.00 20 40 60 80 100 User GS-score percentile Can GS-score predict new communities a user joins? Results
Can GS-score predict new communities a user joins? Results Center-of-mass NN Mean average precision 0.06 Collaborative filtering Popularity 0.04 Random 0.02 0.00 20 40 60 80 100 User GS-score percentile
Community GS-scores
Community GS-scores 1st 2nd 3rd 4th quartile Community GS-score 1.00 0.80 0.75 0.8 0.50 0.75 0.25 0.7 0.70 0.00 0.0 2015 2016 0.2 2017 0.4 2018 2017-1 0.6 2017-6 0.8 2017-11 1.0 Month
predictable than generalists On Reddit, specialists are more likely to generalist to specialist Specialists are significantly more engaged with the platform longer communities longer, but generalists stay Specialists stay engaged with individual Users on Reddit and GitHub range from make exceptional comments In summary 1.0 P(remaining on platform) 1.0 P(stay for >= 6 months) 0.005 0.90 0.8 0.8 0.8 0.002 0.85 0.004 0.6 0.6 0.7 0.80 0.4 0.4 1st quartile 0.001 0.75 2nd quartile 0.003 0.2 0.6 0.2 3rd quartile 0.70 4th quartile 0.000 0.0 0.0 0.0 0.6 0.2 0.8 1.0 0.4 0.6 0.6 0.8 0.8 1.0 1.0 0.0 20 40 0.2 60 80 0.4 20 0.6 40 0.8 60 80 1.0 User's GS-score Activity (# of comments) P(score > parent) Mean average precision Center-of-mass NN 0.06 Collaborative filtering 0.16 Popularity 0.04 Random 0.02 0.14 0.00 20 40 60 80 100 20 40 60 80 100 Percentile author GS-score User GS-score percentile
Recommend
More recommend