tie vegan data diet tie vegan data diet
play

Tie Vegan Data Diet Tie Vegan Data Diet How Wikipedia cuts down - PowerPoint PPT Presentation

Tie Vegan Data Diet Tie Vegan Data Diet How Wikipedia cuts down privacy issues while keeping data fit Marcel Ruiz Forns Software Developer Analytics Team Marcel Ruiz Forns Software Developer Analytics Team Marcel Ruiz Forns Software


  1. Tie Vegan Data Diet Tie Vegan Data Diet How Wikipedia cuts down privacy issues while keeping data fit

  2. Marcel Ruiz Forns Software Developer Analytics Team

  3. Marcel Ruiz Forns Software Developer Analytics Team

  4. Marcel Ruiz Forns Software Developer Analytics Team Anyone can edit!

  5. Marcel Ruiz Forns Privacy Software Developer Analytics Team Why privacy ● What we do ● Implementation ● Pros and cons ● Anyone Questions ● can edit!

  6. https://blog.wikimedia.org/2018/04/18/greece-legal-case-ended

  7. By: Hugh D'Andrade, Senior Designer @ EFF https://commons.wikimedia.org/wiki/File:Laptop-spying.jpg

  8. https://transparency.wikimedia.org

  9. Privacy Privacy

  10. https://foundation.wikimedia.org/wiki/Privacy_policy

  11. ● Read or edit without account. https://foundation.wikimedia.org/wiki/Privacy_policy

  12. ● Read or edit without account. ● Register account without name, email or any other info. https://foundation.wikimedia.org/wiki/Privacy_policy

  13. ● Read or edit without account. ● Register account without name, email or any other info. ● Never selling/sharing your info with third parties. https://foundation.wikimedia.org/wiki/Privacy_policy

  14. ● Read or edit without account. ● Register account without name, email or any other info. ● Never selling/sharing your info with third parties. ● Retaining your info for shortest time possible. https://foundation.wikimedia.org/wiki/Privacy_policy

  15. Usage Data

  16. Usage Data 500M web requests PER HOUR

  17. Usage Data 500M 2000 web requests events PER HOUR PER SECOND

  18. https://stats.wikimedia.org/v2/#/all-projects/reading/legacy-page-views

  19. https://analytics.wikimedia.org/dashboards/browsers/#all-sites-by-os

  20. username mforns ip_adress 31.214.189.167 user_agent Mozilla/5.0 (X11; Linux ... session_id 8c878625792be023 edit_count 4257 ui_skin minerva

  21. username mforns s e n i l e d i u G n o i t n e ip_adress 31.214.189.167 t e R a t a D e s n e l i i d g u _ o n n t i e e t _ r t a D a i / w i k g / o r a . d i e m i k i user_agent Mozilla/5.0 (X11; Linux ... . w t a m e / s : / t p h t , d e t e l e d e b l l w i t i , s y a d 0 9 t s o session_id 8c878625792be023 m t ” a . d r e e fi t i f t A n “ e d i - e d r o d , e t a g e r g g a edit_count 4257 ui_skin minerva

  22. Deleting Data D e l e t i n g D a t a

  23. Deleting Data D e l e t i n g D a t a Are you sure? Cancel Delete

  24. Data g n i t e l e D

  25. --dry-run --execute undef -> execute undef -> dry-run --tables-to-delete --tables-to-delete undef -> all undef -> none * -> all

  26. Executing tests … Tests passed. Starting DRY-RUN. Checking partitions to delete … Partitions that would be deleted by execution: - year=2019, month=1, day=1, hour=0, wiki=en.wikipedia - year=2019, month=1, day=1, hour=0, wiki=es.wiktionary - year=2019, month=1, day=1, hour=0, wiki=de.wikibooks - year=2019, month=1, day=1, hour=1, wiki=en.wikipedia - year=2019, month=1, day=1, hour=1, wiki=es.wiktionary - year=2019, month=1, day=1, hour=1, wiki=de.wikibooks - year=2019, month=1, day=1, hour=2, wiki=en.wikipedia - year=2019, month=1, day=1, hour=2, wiki=es.wiktionary - year=2019, month=1, day=1, hour=2, wiki=de.wikibooks DRY-RUN finished.

  27. --database=event --tables=menuClicks --wikis=en.wikipedia --older-than=90 --skip-trash=true --execute=<checksum>

  28. --database=event --tables=menuClicks --wikis=en.wikipedia --older-than=90 --skip-trash=true Executing tests … Tests passed. Starting DRY-RUN. Checking partitions to delete … Partitions that would be deleted by execution: - year=2019, month=1, day=1, hour=0, wiki=en.wikipedia - year=2019, month=1, day=1, hour=0, wiki=es.wiktionary --execute=<checksum> - year=2019, month=1, day=1, hour=0, wiki=de.wikibooks - year=2019, month=1, day=1, hour=1, wiki=en.wikipedia - year=2019, month=1, day=1, hour=1, wiki=es.wiktionary - year=2019, month=1, day=1, hour=1, wiki=de.wikibooks - year=2019, month=1, day=1, hour=2, wiki=en.wikipedia - year=2019, month=1, day=1, hour=2, wiki=es.wiktionary - year=2019, month=1, day=1, hour=2, wiki=de.wikibooks DRY-RUN finished. Parameter checksum: 57ca7987d987e9e98a6c79

  29. #1 Dry-run --database=event --tables=menuClicks --wikis=en.wikipedia --older-than=90 --skip-trash=true Executing tests … Tests passed. Starting DRY-RUN. Checking partitions to delete … Partitions that would be deleted by execution: - year=2019, month=1, day=1, hour=0, wiki=en.wikipedia - year=2019, month=1, day=1, hour=0, wiki=es.wiktionary --execute=<checksum> - year=2019, month=1, day=1, hour=0, wiki=de.wikibooks - year=2019, month=1, day=1, hour=1, wiki=en.wikipedia - year=2019, month=1, day=1, hour=1, wiki=es.wiktionary - year=2019, month=1, day=1, hour=1, wiki=de.wikibooks - year=2019, month=1, day=1, hour=2, wiki=en.wikipedia #2 Eyecute - year=2019, month=1, day=1, hour=2, wiki=es.wiktionary - year=2019, month=1, day=1, hour=2, wiki=de.wikibooks DRY-RUN finished. Parameter checksum: 57ca7987d987e9e98a6c79

  30. a t a D Sanit�ing

  31. a t a D Sanit�ing Advanced

  32. Unsanitized data 90 days

  33. Unsanitized data 90 days S Sanitized data Kept indefinitely

  34. Unsanitized data 90 days S Sanitized data Kept indefinitely

  35. Unsanitized data 90 days S S Sanitized data Kept indefinitely

  36. Unsanitized date 2019-01-01 ip 31.214.189.167 user_agent Mozilla/5.0 (X11; Linux ... wiki en.wikipedia action click target menu

  37. Black-list Unsanitized date 2019-01-01 ip 31.214.189.167 user_agent Mozilla/5.0 (X11; Linux ... wiki en.wikipedia action click target menu

  38. Black-list Unsanitized Sanitized date 2019-01-01 date 2019-01-01 ip 31.214.189.167 ip NULL user_agent Mozilla/5.0 user_agent NULL (X11; Linux ... wiki en.wikipedia wiki en.wikipedia action click action click target menu target menu

  39. Black-list Unsanitized Sanitized date 2019-01-01 date 2019-01-01 ip 31.214.189.167 ip NULL user_agent Mozilla/5.0 user_agent NULL (X11; Linux ... wiki en.wikipedia wiki en.wikipedia action click action click target menu target menu cookie_id 724310 cookie_id 724310

  40. White-list Unsanitized date 2019-01-01 ip 31.214.189.167 user_agent Mozilla/5.0 (X11; Linux ... wiki en.wikipedia action click target menu

  41. White-list Unsanitized Sanitized date 2019-01-01 date 2019-01-01 ip 31.214.189.167 ip NULL user_agent Mozilla/5.0 user_agent NULL (X11; Linux ... wiki en.wikipedia wiki en.wikipedia action click action click target menu target menu

  42. White-list Unsanitized Sanitized date 2019-01-01 date 2019-01-01 ip 31.214.189.167 ip NULL user_agent Mozilla/5.0 user_agent NULL (X11; Linux ... wiki en.wikipedia wiki en.wikipedia action click action click target menu target menu cookie_id 724310 cookie_id NULL

  43. White-list Unsanitized Sanitized date 2019-01-01 date 2019-01-01 ip 31.214.189.167 ip Spain user_agent Mozilla/5.0 user_agent NULL (X11; Linux ... wiki en.wikipedia wiki en.wikipedia action click action click target menu target menu cookie_id 724310 cookie_id NULL

  44. White-list Unsanitized Sanitized date 2019-01-01 date 2019-01-01 ip 31.214.189.167 ip Spain user_agent Mozilla/5.0 user_agent Linux (X11; Linux ... wiki en.wikipedia wiki en.wikipedia action click action click target menu target menu cookie_id 724310 cookie_id NULL

  45. White-list Unsanitized Sanitized date 2019-01-01 date 2019-01-01 ip 31.214.189.167 ip Spain user_agent Mozilla/5.0 user_agent Linux (X11; Linux ... wiki en.wikipedia wiki en.wikipedia action click action click # target menu target menu cookie_id 724310 cookie_id 8d56ab209e10

  46. P r i v a c y C u l t u r e

  47. Unique visitor�

  48. Unique visitor� UUID

  49. Unique visitor� UUID, REQ UUID, REQ UUID

  50. Unique visitor� UUID, REQ UUID, REQ UUID SELECT COUNT(DISTINCT uuid) FROM database.table WHERE date = ’2019-01-01’;

  51. Unique visitor� UUID, REQ UUID

  52. Unique visitor� LAST ACCESS

  53. Unique visitor� LA, REQ LA, REQ LAST ACCESS

  54. Unique visitor� LA, REQ LA, REQ LAST ACCESS SELECT COUNT(*) FROM database.table WHERE (la IS NULL OR la < date) AND date = ’2019-01-01’;

  55. By: Victor Grigas https://commons.wikimedia.org/wiki/File:Papaul_Tshibamba-4.jpg

  56. Tie Vegan Data Diet

  57. Tie Vegan Data Diet ● Guarantee of privacy ● Less work related to data requests ● Easier to publicize

  58. Tie Vegan Data Diet ● Guarantee of privacy ● Extra work ● Less work related to ● Privacy culture data requests needs time ● Easier to publicize ● Analysts have to compromise

  59. QUESTIONS? QUESTIONS? By: Randall Munroe @ XKCD https://xkcd.com/285/

Recommend


More recommend