Displaying the level of contentiousness of Wikipedia pages via a - PowerPoint PPT Presentation

Displaying the level of contentiousness of Wikipedia pages via a coloring scheme. http://www.wikitruthiness.com Katherine Baker Aaron Miller David Koenig Cullen Walsh

Aspirations v. Reality Goals: Total article content contention determined from reversions, edit wars, other indicators at a paragraph/sentence level. Final Results: Determine recent article contention on a sentence → word level by assigning scores based on content insertion, deletion, and modification. http://www.wikitruthiness.com/

Technical Overview Home Search Compute Analyze Version Diff Search Diffs Graph Results Choose Result Mark Up Fetch No Content w/ Have Cached Wikipedia Analysis Result? Content Results Yes h s e r f e R Display Cache Result http://www.wikitruthiness.com/

Front End Details Home ● Search Utilizing Google Search API: Search Compute Analyze initiate processing (Ruby on Rails) Version Diff Search Diffs Graph Results ● Wikipedia Scraper: Choose Result fetch data for processing (RoR) Mark Up Fetch Content w/ Have Cached ● Render Output w/ Mediawiki API: Wikipedia Analysis Result? Content display the results (Ruby on Rails) Results Yes h s e Work by Cullen Walsh r f e R Display Cache Result http://www.wikitruthiness.com/

Back End Details Home Search Compute Analyze Version Diff Search Diffs Graph Results ● Difference Analysis: Mark Up Fetch No Content w/ version differences graph (Python) Have Cached Wikipedia Analysis Result? Content Results ● Contention Identification: Yes linear scaling (KDE approx.) (Python) h s e r f e R Display Cache Work by David Koenig Result http://www.wikitruthiness.com/

Middleware Details ● AWS Home ● S3 – Caching Results & Wikipedia Data ● EC2 – small instance for front end; high CPU instance for analysis Search Compute ● MySQL Analyze Version ● Queuing requests, storing Wikipedia article versions (30 most recent) Diff Search Diffs Graph Results Work by David Koenig and Cullen Walsh Mark Up Fetch No Content w/ Have Cached Wikipedia Analysis Result? Content Results Yes h s e r f e R Display Cache Result http://www.wikitruthiness.com/

Demonstration http://www.wikitruthiness.com/

Experimental Methodology ● Compare against related work: WikiTrust ● WikiTrust highlights untrustworthy words in a Wikipedia article based on many parameters ● Compute precision, recall against WikiTrust ● True Positives = # blocks which contain > 0 WikiTrust highlighted words ● False Positives = # blocks which do not contain any WikiTrust highlighted words ● False Negatives = # WikiTrust highlighted words which are not within our blocks http://www.wikitruthiness.com/

Experimental Results Precision Recall 10.84% 52.43% Worst 20.25% 68.93% Average 38.82% 79.37% Best Results of evaluating 33 articles Work by Katherine Baker and Aaron Miller http://www.wikitruthiness.com/

Challenges ● Getting the algorithm and coloring to work ● Obtaining cache coherency across memcached, S3, and MySQL ● Comparing data formats of WikiTrust and WikiTruthiness outputs ● Retrieving articles from Wikipedia in a timely fashion http://www.wikitruthiness.com/

What We Learned ● Mixing technologies and having them interface is difficult ● Choosing your development language is important (e.g. Python not always best) ● Limited version history to 30 most current for speed; in production, would use more revisions ● Good evaluation requires significant time and effort, esp. when crawling and processing- intensive algorithms are involved http://www.wikitruthiness.com/

Questions Email: {ajmiller,kbaker4,koenig,ckwalsh}@cs.washington.edu http://www.wikitruthiness.com/

Displaying the level of contentiousness of Wikipedia pages via a - PowerPoint PPT Presentation

Displaying the level of contentiousness of Wikipedia pages via a coloring scheme. http://www.wikitruthiness.com Katherine Baker Aaron Miller David Koenig Cullen Walsh Aspirations v. Reality Goals: Total article content contention

Semantic Wikipedia [[enhances::Wikipedia]] Wikipedia today A free online encyclopdia

Lecture 8/Chapter 7 Part 2. Summarizing Data Ch.7: Measurement Data Summaries Displaying

Saturday, 29 January 2011 OVERVIEW What is Wikipedia/Wikimedia? (Mike) What makes a

Performance Assessment in Optimization Anne Auger, CMAP & Inria Visualization and

CaDaDis: A Tool for Displaying the Behavior of Cognitive Models and Agents Kevin Tor Frank E.

Computers Session 1 INST 346 Agenda The Computer The Course Source: Wikipedia

Wikipedia: n ++ made easy Matt Might University of Utah / NGLY1.org matt.might.net What

Wikipedia Sociographics Jimmy Wales President, Wikimedia Foundation Wikipedia Founder Todays

Introduction to Wikipedia editing Mike Peel 12 November 2014 Questions Who has used

Trade Presentation Wikipedia:

Venues for expert participation in Wikipedia [Wikipedia] is not the bottom layer of authority,

From Non-Expert to Editor: Students Improving Wikipedia Content for Global Communities Becky J.

Mughal Wikipedia Project smithsonian libraries editing Wikipedia articles about indias mughal

Week 6 Video 1 Visualization Learning Curves Visualization Displaying information in a

Frequentist example An entomologist spots what might be a rare subspecies of beetle, due to the

http://ar.wikipedia.org/wiki / http :// www . masraheon . com / . htm 3 .

Trusted evidence. Informed decisions. Better health. Health-Related content on Wikipedia: Why

Example Videos Vis 2006: ritter.avi Displaying vascular structures using strokes

Tor and Wikipedia Roger Dingledine The Free Haven Project 1 Motivation China blocks

Lecture 9/Chapter 7 Summarizing and Displaying Measurement (Quantitative) Data Five Number

Collaboration of open content news in Wikipedia: The role and impact of gatekeepers Ang Li and

Physical Infrastructure Week 1 INFM 603 Agenda The Computer The Internet The Web

What is common between the two systems below? (Wikipedia / Kinetic theory of gases) (Wikipedia /

CSE 154 LECTURE 1: BASIC HTML AND CSS The Internet Wikipedia:

Displaying the level of contentiousness of Wikipedia pages via a - PowerPoint PPT Presentation

Displaying the level of contentiousness of Wikipedia pages via a coloring scheme. http://www.wikitruthiness.com Katherine Baker Aaron Miller David Koenig Cullen Walsh Aspirations v. Reality Goals: Total article content contention

Semantic Wikipedia [[enhances::Wikipedia]] Wikipedia today A free online encyclopdia

Lecture 8/Chapter 7 Part 2. Summarizing Data Ch.7: Measurement Data Summaries Displaying

Saturday, 29 January 2011 OVERVIEW What is Wikipedia/Wikimedia? (Mike) What makes a

Performance Assessment in Optimization Anne Auger, CMAP &amp; Inria Visualization and

CaDaDis: A Tool for Displaying the Behavior of Cognitive Models and Agents Kevin Tor Frank E.

Computers Session 1 INST 346 Agenda The Computer The Course Source: Wikipedia

Wikipedia: n ++ made easy Matt Might University of Utah / NGLY1.org matt.might.net What

Wikipedia Sociographics Jimmy Wales President, Wikimedia Foundation Wikipedia Founder Todays

Introduction to Wikipedia editing Mike Peel 12 November 2014 Questions Who has used

Trade Presentation Wikipedia:

Venues for expert participation in Wikipedia [Wikipedia] is not the bottom layer of authority,

From Non-Expert to Editor: Students Improving Wikipedia Content for Global Communities Becky J.

Mughal Wikipedia Project smithsonian libraries editing Wikipedia articles about indias mughal

Week 6 Video 1 Visualization Learning Curves Visualization Displaying information in a

Frequentist example An entomologist spots what might be a rare subspecies of beetle, due to the

http://ar.wikipedia.org/wiki / http :// www . masraheon . com / . htm 3 .

Trusted evidence. Informed decisions. Better health. Health-Related content on Wikipedia: Why

Example Videos Vis 2006: ritter.avi Displaying vascular structures using strokes

Tor and Wikipedia Roger Dingledine The Free Haven Project 1 Motivation China blocks

Lecture 9/Chapter 7 Summarizing and Displaying Measurement (Quantitative) Data Five Number

Collaboration of open content news in Wikipedia: The role and impact of gatekeepers Ang Li and

Physical Infrastructure Week 1 INFM 603 Agenda The Computer The Internet The Web

What is common between the two systems below? (Wikipedia / Kinetic theory of gases) (Wikipedia /

CSE 154 LECTURE 1: BASIC HTML AND CSS The Internet Wikipedia:

Performance Assessment in Optimization Anne Auger, CMAP & Inria Visualization and