validating every change vandalism
play

Validating every change Vandalism As online communities grow, - PowerPoint PPT Presentation

Validating every change Vandalism As online communities grow, destructive actors increase OSM is vulnerable Mapbox protects the users from harmful data Incorrect/poor quality Harmful data Graffiti Showdown Creative labels


  1. Validating every change

  2. Vandalism As online communities ● grow, destructive actors increase OSM is vulnerable ● Mapbox protects the ● users from harmful data

  3. Incorrect/poor quality

  4. Harmful data

  5. Graffiti Showdown

  6. Creative labels

  7. Creative labels

  8. Creative labels

  9. Creative labels

  10. Creative labels

  11. Creative labels

  12. Creative labels

  13. Creative labels

  14. Creative labels

  15. Statistics (per million changes) 570 incorrect labels ● ● 300 editing failures (dragged nodes) 160 spam incidents ● 100 harmful deletions ● ● 50 obscene labels ● 20 graffiti ... ●

  16. Daily change statistics 2 million features get touched ● 10k label edits ● 30k changesets ● 0.2% is vandalism ○ 2% are low quality ○ 20k new contributors join ● monthly 30% of new users make a mistake ○ in their first 10 edits

  17. Daily touched features by data layer

  18. Past approaches Potential vandalism at Mapbox Sharp Profanity angles check Validating changesets ● ● Relying only on algorithms Human review Monitor new users ● Building blacklists ● One approach does not address all cases of Vandalism vandalism.

  19. Approach Step 3 Step 2 Step 1 Diff the changes per Cluster daily changes Split the OSM mono day. into deltas. layer into data layers

  20. A new unit of change

  21. Approach Step 6 Step 5 Step 4 Apply the updates to Share harmful changes Review the daily the map and protect and fix them changes from harmful changes

  22. Machine review Profanity checking in 100 ● languages for labels Use NLP to determine how ● likely a label is a place name Shape classifiers for likelihood ● of a shape being a building Drastic changes to stable ● features ... ●

  23. Human review Review changes in ● ○ geometry ○ labels hierarchy ○ primary tags ○ ● Classify harmful changes

  24. Isolate changes

  25. QA of reviews Review team regularly ● gets sampled ● >99% accuracy for selected cases Expert mappers double ● check each review and single out the problematic features

  26. Review statistics from Mapbox ● Our review team reviews all 80’000 changes on a daily basis ● We flag around 1000-2000 changes a day We fix >200 defects on a daily basis ● ● 50% of issues are fixed by OSM

  27. Daily catch

  28. Sharing vandalism detections osmcha.mapbox.com ● is the one stop shop for OSM validation ● All our harmful detections are made public ● Mapbox regularly fixes harmful data

  29. Sharing harmful edits

  30. Only 0.2% of edits are vandalism ● OSM is eventually consistent ● Mapbox provides you a validated view of ● Takeaways OSM Let’s protect the future of OSM before ● vandalism becomes a bigger problem We need better shared monitoring ● efforts

Recommend


More recommend