privacy fairness in data science
play

Privacy & Fairness in Data Science CS848 Fall 2019 2 - PowerPoint PPT Presentation

Privacy & Fairness in Data Science CS848 Fall 2019 2 Instructor Xi He: Research interest: privacy and fairness for big-data management and analysis CS848, Fall 2019: Tue: 3:00pm - 5:50pm (DC2568) 3 Tell me why do you


  1. Privacy & Fairness in Data Science CS848 Fall 2019

  2. 2 Instructor Xi He: • Research interest: privacy and fairness for big-data management and analysis • CS848, Fall 2019: – Tue: 3:00pm - 5:50pm (DC2568)

  3. 3 Tell me … … why do you want to do this course?

  4. 4 Personalization …

  5. 5 Online Advertising In perspective: ~90% of Google’s revenue comes from online ads (as of 2015)

  6. 6 Online Advertising In perspective: ~90% of Google’s revenue comes from online ads (as of 2015)

  7. 7

  8. 8 Health Red : official numbers from Center for Disease Control and Prevention; weekly Black : based on Google search logs; daily (potentially instantaneously) Detecting influenza epidemics using search engine query data http://www.nature.com/nature/journal/v457/n7232/full/nature07 634.html

  9. 9 Medicine https:// www.nature.com /news/personalized-medicine-time-for-one-person-trials-1.17411

  10. 10 Precision Medicine Source: forbes.com

  11. 11 Predictive Policing

  12. 12 Predictive Policing

  13. 13 The dark side of the force… http://ragekg.deviantart.com/art/The-Dark-Side-of-the-Force-174559980

  14. 14 39% of the experts agree… Thanks to many changes, including the building of “the Internet of Things,” human and machine analysis of Big Data will cause more problems than it solves by 2020. The existence of huge data sets for analysis will engender false confidence in our predictive powers and will lead many to make significant and hurtful mistakes . Moreover, analysis of Big Data will be misused by powerful people and institutions with selfish agendas who manipulate findings to make the case for what they want. And the advent of Big Data has a harmful impact because it serves the majority (at times inaccurately) while diminishing the minority and ignoring important outliers. Overall, the rise of Big Data is a big negative for society in nearly all respects. — 2012 Pew Research Center Report http://pewinternet.org/Reports/2012/Future-of-Big-Data/Overview.aspx

  15. 15 Harm due to personalized data analytics … • Privacy • Fairness

  16. 16 Where is the data coming from?

  17. 17 Where is the data coming from? • Census surveys • Photos • IRS Records • Videos … n o i t a m r o f • Medical records • Smart phone Sensors n i e v i • Insurance records • Mobility trajectories t i s n e s y r e V • Search logs • … • Browse logs • Shopping histories

  18. 18 How is this data collected? http://graphicsweb. wsj.com /documents/divSlider /media/ecosystem100730.png

  19. 19 Isn’t my data anonymous ?

  20. 20 Device Fingerprinting

  21. 21 https://panopticlick.eff.org/

  22. 22 Let’s get rid of unique identifiers …

  23. 23 The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002] •Name •SSN • Zip •Visit Date • Birth •Diagnosis date •Procedure •Medication • Sex •Total Charge Medical Data

  24. 24 The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002] •Name •Name •SSN •Address • Zip •Date •Visit Date • Birth Registered •Diagnosis date •Party •Procedure affiliation •Medication • Sex •Date last •Total Charge voted Medical Data Voter List

  25. 25 The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002] • Governor of MA •Name •Name uniquely identified •SSN •Address • Zip using ZipCode, •Date •Visit Date • Birth Registered •Diagnosis Birth Date, and Sex. date •Party •Procedure affiliation •Medication Name linked to • Sex •Date last •Total Charge Diagnosis voted Medical Data Voter List

  26. 26 The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002] • Governor of MA 87 % of US population •Name •Name uniquely identified •SSN •Address • Zip using ZipCode, •Date •Visit Date • Birth Registered •Diagnosis Birth Date, and Sex. date •Party •Procedure affiliation •Medication • Sex •Date last •Total Charge voted Quasi Medical Data Voter List Identifier

  27. 27 AOL data publishing fiasco

  28. 28 AOL data publishing fiasco … Xi222 Uefa cup Xi222 Uefa champions league Xi222 Champions league final Xi222 Champions league final 2013 Abel156 exchangeability Abel156 Proof of deFinitti’s theorem Jane12345 Zombie games Jane12345 Warcraft Jane12345 Beatles anthology Jane12345 Ubuntu breeze Bob222 Python in thought Bob222 Enthought Canopy

  29. User IDs replaced with random 29 numbers 865712345 Uefa cup 865712345 Uefa champions league 865712345 Champions league final 865712345 Champions league final 2013 236712909 exchangeability 236712909 Proof of deFinitti’s theorem 112765410 Zombie games 112765410 Warcraft 112765410 Beatles anthology 112765410 Ubuntu breeze 865712345 Python in thought 865712345 Enthought Canopy

  30. Privacy Breach 30 [NYTimes 2006]

  31. 31 Machine learning models can reveal sensitive information Number of Impressions Facebook Profile + Who are 25 interested in Men + Who are + 0 interested in Women Online Data Facebook’s learning algorithm uses private information to predict match to ad [Korolova JPC 2011]

  32. 32 Genome wide association studies [Homer et al PLOS Genetics 08] Results of a GWAS study High density SNP profile of Bob Did Bob participate in the study

  33. 33 Harm due to personalized data analytics … • Privacy • Fairness

  34. The red side of learning 34 • Redlining : the practice of denying, or charging more for, services such as banking, insurance, access to health care, or even supermarkets, or denying jobs to residents in particular, often racially determined, areas.

  35. 35 Predictive Policing • Predictive policing systems use machine learning algorithms to predict crime. • But … the algorithms learn … patterns not about crime, per se, but about how police record crime. • This can amplify existing biases

  36. 36 https://www.nytimes.com/2015/07/10/upshot/ when-algorithms-discriminate.html

  37. 37

  38. 38 Deep Learning Incredibly powerful tool for … • Extracting regularities from data according to a given data • Amplifying bias!

  39. 39 http://slides.com/simonescardapane/the-dark-side-of-deep-learning

  40. 40 http://slides.com/simonescardapane/the-dark-side-of-deep-learning

  41. 41 Deep Learning Incredibly powerful tool for … • Extracting regularities from data according to a given data • Amplifying privacy concerns!

  42. 42

  43. 43 This course: Learn to combat the dark side http://www.webvisionsevent.com/userfiles/lightsabercrop_large_verge_medium_landscape.jpg

  44. 44 You will … • mathematically formulate privacy. • mathematically formulate fairness.

  45. 45 Differential Privacy For every pair of inputs For every output … that differ in one row D 1 D 2 O Adversary should not be able to distinguish between any D 1 and D 2 based on any O log Pr[A(D 1 ) = O] < ε (ε>0) Pr[A(D 2 ) = O] .

  46. 46 You will … • mathematically formulate privacy. • mathematically formulate fairness. • design algorithms to ensure privacy • design algorithms to ensure fairness

  47. 47 Differential Privacy in practice OnTheMap [ICDE 2008] [ CCS 2014] [ Apple WWDC 2016]

  48. 48 You will … • mathematically formulate privacy. • mathematically formulate fairness. • design algorithms to ensure privacy • design algorithms to ensure fairness • do research into the interplay between privacy and fairness.

  49. 49 Course Format • Module 1: Intro to Privacy In-class Exercise In-class Mini-project • Module 2: Intro to Fairness Lectures • Module 3: Paper Reading by Topics – privacy v.s. fairness Read papers – private machine learning Mini-critiques – deployments of DP Research Project – sources of bias – fairness mechanisms

  50. 50

  51. 51 What we expect you to know … • Strong background in – Probability – Proof techniques • Some knowledge of – Programming with Python – Machine learning – Statistics – Algorithms

  52. 52 Misc. course info • Website : https://cs.uwaterloo.ca/~xihe/cs848 – Schedule (with links to lecture slides, readings, projects, etc.) • Grading – In class mini-projects: 10% x 2 – Mini-critiques: 10% – Class participation and presentation: 20% • Attending class! – Project: 50% • LEARN for submission and grades: – https://learn.uwaterloo.ca/d2l/home/492027

  53. 53 Academic Integrity • See course website • Mini-project reports and paper critiques are individual work and submission. • Group discussion okay (and encouraged), but – Acknowledge help you receive from others – Make sure you “own” your solution • All suspected cases of violation will be aggressively pursued

  54. 54 Reference • Course materials are adapted from: https://sites.duke.edu/cs590f18privacyfair ness/

Recommend


More recommend