Can Who-Edits-What Predict Edit Survival? Batuhan Yardm, Victor - PowerPoint PPT Presentation

Can Who-Edits-What Predict Edit Survival? Batuhan Yardım, Victor Kristof , Lucas Maystre, Matthias Grossglauser I nformation and N etwork Dy namics Lab (indy.ep fl .ch) — August 23, 2018 — KDD18 – London

Peer-production systems Emergence of self-organizing, crowd-sourced projects online. Distributed vs. centralized production. � 2

Problem Projects are victims of their own success : problems arise with increasing scale . Alan Turing « Alan Turing was an English « Blah blih bluh!@!? » computer scientist… » ??? ??? ??? Predict quality of contributions. Help project maintainers in their work. Help users match their interests. � 6

Typical approaches User reputation systems Highly specialized predictors INTERANK 42 58 23 #words timestamp user IP Simple Simple Complex General General Specialized Not accurate Accurate Accurate � 7

Model: INTERANK Experiment: Wikipedia Experiment: Linux

INTERANK: basic variant Model the probability p ui that an edit made by user u on item i is successful … …as a game between user u and item i (inspired by Bradley-Terry models). 1 1 + exp[ − ( s u − d i + b )], s u , d i , b ∈ R p ui = If s u increases , p ui increases . If d i increases , p ui decreases . Skill of user u Di ffi culty of item i Bias Informally: • Skill quanti fi es ability of user to make a contribution . • Di ffi culty quanti fi es how « resistant » to contributions a particular item is. � 10

INTERANK: full variant Too simplistic: if user u is more skilled than user v , then p ui > p vi for all items i . Need to capture the interactions between users and items. 1 x u , y i ∈ R D p ui = u y i + b )], 1 + exp[ − ( s u − d i + x ⊺ If and are close , p ui increases . x u y i Embedding of Dimension of Embedding of user u latent space item i Informally: • describes the set of skills displayed by user u . x u • describes the set of skills needed to edit item i . y i � 11

INTERANK: learning A dataset of K observations consists of triplets ( u k , i k , q k ) , k =1 , …, K . 𝒠 The outcome q k {0, 1} encodes whether an edit by user u on item i survives. ∈ ∑ [ − q log p ui − (1 − q )log(1 − p ui ) ] − ℓ ( θ ; 𝒠 ) = ( u , i , q ) ∈𝒠 basic: full: θ = [ s 1 , . . . , s N , d 1 , . . . , d M ] θ = [ s 1 , . . . , s N , d 1 , . . . , d M , { x u 1 , . . . , x uD } N u =1 , { y i 1 , . . . , y iD } M i =1 ] basic: log-likelihood is convex full: bilinear term breaks convexity In practice: • We do not observe any convergence issues. • We reliably fi nd good model parameters using Stochastic Gradient Descent . � 12

Wikipedia Edition # users # articles # edits French 5.5M 1.9M 65M Turkish 1.4M 0.3M 8.8M Competing approaches Average: GLAD: [Whitehill et al., 2009] 1 # good edits 1 p ui = p = 1 + exp[ − ( s u / d i + b )] # total edits 1 + exp[ − ( s u + b )] INTERANK Naive predictor User-only: [Adler & de Alfaro, 2007] ORES: [Halfaker & Taraborelli, 2015]: Uses over 80 content-based and system- 1 p u = based features. Di ff erent for Turkish and 1 + exp[ − ( s u + b )] French. Specialized predictor Reputation system � 14

�� Wikipedia: results ORES has the best AUPRC and INTERANK full has the best log-likelihood . � 15

Wikipedia: difficulty parameter d i Compare: Rank Title Percentile of di d i 1 Ségolène Royal 99.840 % • Manual ranking of controversial articles 2 Unidenti fi ed fl ying object 99.229 % [Yasseri et al., 2014] 3 Jehovah’s Witnesses 99.709 % 4 Jesus 99.953 % • Ranking of di ffi culty parameter d i as 5 Sigmund Freud 97.841 % learned by INTERANK 6 September 11 attacks 99.681 % 7 Muhammad al-Durrah incident 99.806 % 8 Islamophobia 99.787 % 9 God in Christianity 99.712 % 10 Nuclear power debate 99.304 % � 16

�� Wikipedia: latent factors Y = [ y i ] Highest Lowest Seven Wonders of the World Harry Potter’s magic list Thomas Edison List of programs broadcasted by Star TV s e l s c e Cell Bursaspor 2011-12 season i t l r c Justine Henin a i t r e Julie Halard a Mustafa Kemal Atat ü rk Kral Pop TV Top 20 r u e Virginia Wade r t l u William Shakespeare u Marcelo Melo t Albert Einstein Death Eater c l M. de Robespierre u r … c a Nelson Mandela l h u Democracy Heroes (TV series) g p Charlemagne i H o P … Isaac Newton List of programs broadcasted by TV8 Mehmed the Conqueror Karadayı TV & teen culture French municipality Leonardo da Vinci Show TV Tennis-related Other Louis Pasteur List of episodes of Kurtlar Vadisi Pusu � 17

�� Linux # developers # subsystems # patches % accepted 9 672 394 619 419 34.12 % Dataset from [Jiang et al., 2013]. Developers submit patches to subsystems. A patch is accepted if it makes it into a Linux release. Specialized classi fi er: random forest using 21 features. � 19

Linux: difficulty parameter Di ffi culty Subsystem % accepted +2.66 1.9 % usr Avg. number of commits in +1.33 7.8 % include Core components last quartile = 833 +1.04 16.0 % lib +1.01 34.3 % drivers/clk +0.87 17.7 % include/trace -0.80 45.4 % arch/mn10300 Avg. number of commits in -0.94 73.0 % net/nfc Peripheral components fi rst quartile = 687 -0.99 44.3 % drivers/ps3 -1.08 43.1 % net/tipc -1.19 78.3 % drivers/addi-data « Higher number of commits leads to lower acceptance rate. » [Jiang et al., 2013] � 20

Conclusion INTERANK provides a new point in the solution space . Specialized predictors INTERANK Accuracy Reputation systems Generality Easy to implement and computationally inexpensive. Yields insights into collaborative projects. � 21

Can who-edits-what predict edit survival? YES!

Thank you! /lca4/interank

Can Who-Edits-What Predict Edit Survival? Batuhan Yardm, Victor - PowerPoint PPT Presentation

Can Who-Edits-What Predict Edit Survival? Batuhan Yardm, Victor Kristof , Lucas Maystre, Matthias Grossglauser I nformation and N etwork Dy namics Lab (indy.ep fl .ch) August 23, 2018 KDD18 London Peer-production systems Emergence of

Delta highlighting Delta highlighting edits highlighted Delta highlighting edits highlighted

Cha-Q 2 adding feature resolving issue adding feature resolving issue 3 Systematic Edits 4

Click to edit Master title style DRVR Click to edit Master title style Click to edit Master

Click to edit Master title style Click to edit Master title style Click to edit Master title

Minimum Cost Edit Distance Edit a source string into a target string Each edit has a cost

Click to edit Master title style Click to edit Master title style Edit Master text styles Edit

Suggesting Edits to Explain Failing Traces Giles Reger University of Manchester, Manchester, UK

Survival Analysis / Time-to- Event Analysis in R Heidi Seibold Statistician at LMU Munich

PREDICT- -HD HD PREDICT BIG QUESTION: What do we need before we can treat HD ? How does

Survival curve showing cohorts Overall Survival Survival Frequency Time (%) 1 year 53.7 2

Survival Analysis Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Click to edit Master title style TSX:KGI Click to edit Master Click to edit Master text

Click to edit Master title style Click to edit Master title style Click to edit Master title

Click to edit Master title style Click to edit Master title style Regional Planimetric Data

Click to edit Master title style Click to edit Master title style Denver Regional Aerial

Click to edit Master title style Click to edit Master Click to edit Master text styles

CSE 440: Introduction to HCI User Interface Design, Prototyping, and Evaluation Lecture 01:

Knowledge Visualization Hauptseminar "Information Visualization" Wintersemester

Thanks to MTCU! Thanks to MTCU for funding this workshop under Community Literacy of

Performance A Primer on Making Drupal Fast Martin Anderson-Clutz - Digital Echidna -

Correlation between Stick - Slip events and contact charging in the dynamic friction at

Intelligent Systems for Scientific Discovery Yolanda Gil Information Sciences

By Ms May Lim (FT) Ms Josephine Teow (Co-FT) Introduction of teachers

Ms Aliah (Form Teacher) Ms Koh (Co-Form Teacher) Introduction of teachers Name Subject DID

Sambuz

Useful Links

Newsletter

Mail Us

Can Who-Edits-What Predict Edit Survival? Batuhan Yardm, Victor - PowerPoint PPT Presentation

Can Who-Edits-What Predict Edit Survival? Batuhan Yardm, Victor Kristof , Lucas Maystre, Matthias Grossglauser I nformation and N etwork Dy namics Lab (indy.ep fl .ch) August 23, 2018 KDD18 London Peer-production systems Emergence of

Delta highlighting Delta highlighting edits highlighted Delta highlighting edits highlighted

Cha-Q 2 adding feature resolving issue adding feature resolving issue 3 Systematic Edits 4

Click to edit Master title style DRVR Click to edit Master title style Click to edit Master

Click to edit Master title style Click to edit Master title style Click to edit Master title

Minimum Cost Edit Distance Edit a source string into a target string Each edit has a cost

Click to edit Master title style Click to edit Master title style Edit Master text styles Edit

Suggesting Edits to Explain Failing Traces Giles Reger University of Manchester, Manchester, UK

Survival Analysis / Time-to- Event Analysis in R Heidi Seibold Statistician at LMU Munich

PREDICT- -HD HD PREDICT BIG QUESTION: What do we need before we can treat HD ? How does

Survival curve showing cohorts Overall Survival Survival Frequency Time (%) 1 year 53.7 2

Survival Analysis Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester

Click to edit Master title style TSX:KGI Click to edit Master Click to edit Master text

Click to edit Master title style Click to edit Master title style Click to edit Master title

Click to edit Master title style Click to edit Master title style Regional Planimetric Data

Click to edit Master title style Click to edit Master title style Denver Regional Aerial

Click to edit Master title style Click to edit Master Click to edit Master text styles

CSE 440: Introduction to HCI User Interface Design, Prototyping, and Evaluation Lecture 01:

Knowledge Visualization Hauptseminar &quot;Information Visualization&quot; Wintersemester

Thanks to MTCU! Thanks to MTCU for funding this workshop under Community Literacy of

Performance A Primer on Making Drupal Fast Martin Anderson-Clutz - Digital Echidna -

Correlation between Stick - Slip events and contact charging in the dynamic friction at

Intelligent Systems for Scientific Discovery Yolanda Gil Information Sciences

By Ms May Lim (FT) Ms Josephine Teow (Co-FT) Introduction of teachers

Ms Aliah (Form Teacher) Ms Koh (Co-Form Teacher) Introduction of teachers Name Subject DID

Sambuz

Useful Links

Newsletter

Mail Us

Knowledge Visualization Hauptseminar "Information Visualization" Wintersemester