responsible data science
play

Responsible data science Informa3on ethics & privacy Dino - PowerPoint PPT Presentation

Responsible data science Informa3on ethics & privacy Dino Pedreschi EUI-SoBigData.eu workshop 11 October 2017 URBAN MOBILITY ATLAS Urban Mobility Atlas http://kdd.isti.cnr.it/uma2/ REAL TIME DEMOGRAPHY A Sociometer based on mobile phone


  1. Responsible data science Informa3on ethics & privacy Dino Pedreschi EUI-SoBigData.eu workshop 11 October 2017

  2. URBAN MOBILITY ATLAS

  3. Urban Mobility Atlas http://kdd.isti.cnr.it/uma2/

  4. REAL TIME DEMOGRAPHY

  5. A Sociometer based on mobile phone data for real 3me demographics GSM Calls Temporal Profile Profile Map

  6. 8

  7. San Pietro Square

  8. DIVERSITY & WELLBEING

  9. Big Data: Diversity and economic development B H W A C

  10. THE POLYCENTRIC CITY

  11. EMERGENT CITY STRUCTURE FLUXES ORIGINATING IN TUSCAN CITIES

  12. POLYCENTRIC CITY

  13. Ethics and Security

  14. The GDPR Ø Will enter into force on 25 May 2018 Ø Introduces important novelKes Ø New ObligaKons Ø New Rights

  15. Privacy by Design

  16. Privacy by design big data analy3cs Ø Design analyKcal process that implement the privacy-by- design & by-default principle Ø Consider privacy at every stage of their business Ø Integrate privacy requirements “by design” into their business model. 19

  17. Privacy by Design Methodology in Big Data Analy3cs � The framework is designed with assumptions about § The sensitive data that are the subject of the analysis § The attack model , i.e., the knowledge and purpose of a malicious party that wants to discover the sensitive data § The target analytical questions that are to be answered with the data � Design a privacy-preserving framework able to § transform the data into an anonymous version with a quantifiable privacy guarantee § guarantee that the analytical questions can be answered correctly, within a quantifiable approximation that specifies the data utility

  18. Privacy Risk Assessment Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa) www-kdd.isK.cnr.it

  19. Privacy Risk Assessment Framework

  20. Simula3on of privacy harmful Inferences Data dimension: The spa(al area in which the analysis is performed. Background Knowledge dimension: The temporal window (in weeks) in which the a9acker recorded the user ac(vity. I-RACu: An indicator of the risk of re- iden(fica(on of the users K K B B f f o o s s k e k e e e w w # #

  21. Privacy-by-design for big data analy3cs • All case studies discussed have been designed within a privacy-preserving framework • taking into account data minimiza3on in the deployment of the service • transforming raw data into aggregated data with a quan3fied (low) risk of privacy breach

  22. But we need to go further! • A city cannot be managed centrally, from a control room. • Our ciKes are complex networks of interacKons – the outcome for everybody depends not only on individual choices but it is condiKoned by everybody else's choices.

  23. • A granular capability of ciKzens to self-organize, collaborate and coordinate their acKons from the bobom-up is more efficient and resilient • But requires to align individual interests and goals with those of the collecKvity in the system. – We humans have a limited percepKon of ourselves as a social, collecKve living being

  24. TOWARDS A PERSONAL DATA ECOSYSTEM

  25. • An avalanche of personal informaKon that, in most cases, gets lost – like tears in rain . • Yet, only each one of us, individually, has the power to connect all this personal informaKon into a personal data repository – and make sense of it.

  26. A user-centric ecosystem for personal big data 31

  27. Personal Data Ecosystem

  28. Where am I? Comparison with the community

  29. • We need a Personal Data Ecosystem – to acquire, integrate and make sense of our own data – and to connect with our peers and the surrounding urban community and infrastructure • to the purpose of developing the collec3ve awareness needed to face our grand challenges

  30. A smart city is a city of par3cipa3ng, aware ci3zens

  31. Big Data, Big Risks • Big data is algorithmic, therefore it cannot be biased! And yet… � • All traditional evils of social discrimination, and many new ones, exhibit themselves in the big data ecosystem � • Because of its tremendous power , massive data analysis must be used responsibly • Technology alone won’t do: also need policy , user involvement and education efforts � 36

  32. • By 2018, 50% of business ethics violaKons will occur through improper use of big data analyKcs • [source: Gartner, 2016] AI and Big Data 37

  33. The danger of black boxes • The COMPAS score (CorrecKonal Offender Management Profiling for AlternaKve SancKons) • A 137-quesKons quesKonnaire and a predicKve model for “risk of crime recidivism.” The model is a proprietary secret of Northpointe, Inc. • The data journalists at propublica.org have shown that the model has a strong ethnic bias – blacks who did not reoffend are classified as high risk twice as much as whites who did not reoffend – whites who did reoffend were classified as low risk twice as much as blacks who did reoffend. AI and Big Data 38

  34. The danger of black boxes • An accurate but untrustworthy classifier may result from an accidental bias in the training data. • In a task of discriminaKng wolves from huskies in a dataset of images, the resulKng deep learning model is shown to classify a wolf in a picture based solely on … AI and Big Data 39

  35. The danger of black boxes • An accurate but untrustworthy classifier may result from an accidental bias in the training data. • In a task of discriminaKng wolves from huskies in a dataset of images, the resulKng deep learning model is shown to classify a wolf in a picture based solely on … the presence of snow in the background! AI and Big Data 40

  36. Deep learning is crea3ng computer systems we don't fully understand AI and Big Data 41

  37. Transparent algorithms to build trust • Systems that recommend humans making a decision should explain why

  38. Delp 17 – 19 February 2016

  39. Soccer Player Ratings �

  40. Soccer Player Ratings � How humans evaluate sports performance?

  41. Goalkeepers Forwards goals suff. Defenders Midfielders (FW) goal diff goal diff goals suff.

  42. Machine performance Human evalua3on line Technical features

  43. Machine performance Human evalua3on line Technical features Technical + Contextual features

  44. Social Mining & Big Data Analy3cs H2020 - www.sobigdata.eu September 2015- August 2019

Recommend


More recommend