GETTING DATA PROTECTION RIGHT Prof. dr. Mireille Hildebrandt Interfacing Law & Technology Vrije Universiteit Brussel Smart Environments, Data Protection & the Rule of Law Radboud University
21/2/17 Hildebrandt SNS seminar Stockholm 2
what’s next? 1. 1. From m online ine to onlif ife 2. 2. Machine ne Learning rning 3. 3. Data a Protec ectio tion 21/2/17 Hildebrandt SNS seminar Stockholm 3
what’s next? 1. From om on onli line to o on onli life 21/2/17 Hildebrandt SNS seminar Stockholm 4
online → onli f e ■ internet: packet switching & routing, network structure, ■ world wide web: hyperlinking ■ search engines, blogs, social media, web portals ■ web platforms [network effects & filter bubbles; reputation & fake news] ■ mobile applications [moving towards IoT, wearables] ■ IoT: cyberphysical infrastructures [connected cars, smart energy grids] ■ cloud computing, fog computing & edge computing 21/2/17 Hildebrandt SNS seminar Stockholm 5
21/2/17 Hildebrandt SNS seminar Stockholm 6
21/2/17 Hildebrandt SNS seminar Stockholm 7
21/2/17 Hildebrandt SNS seminar Stockholm 8
onli f e: data driven agency ■ creating added value from big data or small data ■ predicting behaviours ■ pre-empting behaviours ■ interplay of backend & frontend of computing systems ■ interfaces enable but they also hide , nudge and force [AB testing, ‘by design’ paradigms] 21/2/17 Hildebrandt SNS seminar Stockholm 9
onli f e: digital unconscious Big Data Space: ce: ■ accumulation of behavioural and other data ■ mobile and polymorphous data & hypothesis spaces ■ distributed storage [once data has been shared, control becomes a challenge] ■ distributed access [access to data or to the inferences, to training set & algos] 21/2/17 Hildebrandt SNS seminar Stockholm 10
21/2/17 Hildebrandt SNS seminar Stockholm 11
onli f e: digital unconscious Big Data Space: ce: the e envelop elop of big data space drives human agency, providing convenience & resilience Weiser’s calm computi uting , IBM’s auton onom omic ic computi uting : increasing dependence on the dynamics of interacting data driven cyberphysical systems 21/2/17 Hildebrandt SNS seminar Stockholm 12
21/2/17 Hildebrandt SNS seminar Stockholm 13
what’s next? 2. Machin ine Learni ning ng 21/2/17 Hildebrandt SNS seminar Stockholm 14
21/2/17 Hildebrandt SNS seminar Stockholm 15
big data, open data, personal data ■ BIG – volume (but, n=all is nonsense) – variety (unstructured in sense of different formats) – velocity (real time, streaming) ■ OP OPEN EN as opposed to proprietary? reuse? repurposing? public-private? – creating added value is hard work, not evident, no guarantees for return on investment ■ PER ERSONA ONAL data: IoT will contribute to a further explosion of personal data – high risk high gain (think DPIA)? anonymisation will mostly be pseudonymisation! 21/2/17 Hildebrandt SNS seminar Stockholm 16
machine Learning (ML) “ we say that a machine learns: - with respect to a particular task T, - performance metric P, and - type of experience E, if if - the system reliably improves its performance P - at task T, - following experience E .” (Tom Mitchell) http://www.cs.cmu.edu/~tom/mlbook.html 21/2/17 Hildebrandt SNS seminar Stockholm 17
types of machine learning ■ super pervi vised sed (lear arning ning from om exam ample les – requi uire res s labelli elling, ng, doma main in exper ertise tise) ■ reinf inforce rcement ment (lea earnin rning g by correcti rection on - requires uires prior r doma omain in exper erti tise) se) ■ uns nsuper upervised vised (bott ttum up up, induc ucti tive e – danger nger of overfitt tting) ing) 21/2/17 Hildebrandt SNS seminar Stockholm 18
21/2/17 Hildebrandt SNS seminar Stockholm 19
bias optimisation spurious correlations ■ 2. have a network trained to recognize animal faces ■ 1. present it with a picture of a flower ■ 2. run the algorithms ■ 3. check the output (see what it sees) http://www.nature.com/news/can-we-open-the-black-box-of-ai-1.20731 21/2/17 Hildebrandt SNS seminar Stockholm 20
Wol olper pert: : no no free ee lu lunc nch h theor orem em Wher here d = trainin ning g set; et; f = ‘target’ input -ou outp tput ut relat ationshi ionships; s; h = hypo poth thesi esis (the he algori rith thm's m's gue uess ss for f made de in response ponse to d); ; and C = off-trai training ng- set ‘loss’ associated with f and h (‘generalization error’) How well you do is determined by how ‘aligned’ your learning algorithm P( h|d) is with the actual posterior, P(f|d). Check http://www.no-free-lunch.org 21/2/17 Hildebrandt SNS seminar Stockholm 21
Wol olper pert: : no no free ee lu lunc nch h theor orem em Summary: – The bias that is necessary to mine the data will co-determine the results – This relates to the fact that the data used to train an algorithm is finite – ‘Reality’, whatever that is, escapes the inherent reduction – Data is not the same as what it refers to or what it is a trace of 21/2/17 Hildebrandt SNS seminar Stockholm 22
21/2/17 Hildebrandt SNS seminar Stockholm 23
21/2/17 Hildebrandt SNS seminar Stockholm 24
trade-offs ■ NFL FL theo eorem rem – overfitting, overgeneralization ■ trainin ning g set, et, domai main n kno nowled wledge, ge, hypo poth theses ses space, ce, test st set et – accuracy, precision, speed, iteration ■ low w hanging ging frui uit t – may be cheap and/or available but not very helpfull ■ data nor algori rith thms s are object jectiv ive e – bias in the data, bias of the algos, guess what: bias in the output ■ the e more re data, a, the e larger er the e hypo poth theses es sp space, e, the e more ore pattern erns – spurious correlations, computational artefacts 21/2/17 Hildebrandt SNS seminar Stockholm 25
data hoarding & obesitas ■ data obes esitas itas: : lots of data, but often incorrect, incomplete, irrelevant (low hanging fruit) – any personal data stored presents security and other risks sks (need for DPIA, DPbD) – pu purpose rpose limitati tion on is crucial: select ect before re you ou collect lect (and while, and after) ■ pattern ern obesi esitas tas: : trained algorithms can see patterns anywhere, added value? – training set and algorithms ne necessari essarily ly contain bias, this may be problematic (need for DPIA, DPbD) – purpose pu rpose limitati tion on is crucial: to prevent spurious correlations, to test t rele levance nce 21/2/17 Hildebrandt SNS seminar Stockholm 26
agile and lean computing ■ agile e softw tware are developme elopment: nt: – iteration instead of waterfall – collaboration domain experts, data scientists, whoever invests – initial purpose (prediction of behaviour, example: tax office, car insurance) – granular purposing (testing specific patterns, AB testing to nudge specific behaviour) ■ lean n com omputing: uting: – less data = more effective & more efficient ■ meth ethodo dologi logica cal l integri egrity ty: – make your software testable and contestable: mathematical & empirical software verification – secure logging, open source 21/2/17 Hildebrandt SNS seminar Stockholm 27
what’s next? 4. Data a Protect ection ion Law 21/2/17 Hildebrandt SNS seminar Stockholm 28
pr priv ivacy acy and nd aut utonomy onomy ■ th the im impli lica cati tion ons s of of pre-empti tive co computi ting: – AB testing & nudging – pre-emption of our intent, playing with our autonomy – we become subject to decisions of data-driven agents – this choice architecture may generate manipulability 21/2/17 Hildebrandt SNS seminar Stockholm 29
no non-discrimination discrimination ■ three ee type pes of bi bias: – bias inherent in any action-perception-system (APS) – bias that some would qualify as unfair – bias that discriminates on the basis of prohibited legal grounds 21/2/17 Hildebrandt SNS seminar Stockholm 30
the opacity argument in ML: 1. 1. intent ntional nal conceal alment ment – trade de secre rets ts, , IP right hts, s, pub ublic c security urity 2. 2. we we have learned d to read and write, , not ot to code or do machine hine learning ing – monopoly of the new ‘clerks’, the end of democracy 3. 3. mismatc match h betwee etween mathe hematic matical al optimi miza zation tion and human an semant ntics ics – when it comes to law and justice we cannot settle for ‘computer says no’ – inspired by: Jenna Burrell, How the machine ‘thinks’: Understanding opacity in machine learning algorithms’, in Big Data ta & Society ty, January-June 2016, 1-12 21/2/17 Hildebrandt SNS seminar Stockholm 31
Recommend
More recommend