does de identification work
play

Does De-identification Work ? Khaled El Emam, CHEO RI & uOttawa - PDF document

Does De-identification Work ? Khaled El Emam, CHEO RI & uOttawa Key Points Progress The evidence that it is easy to re-identify health data Th id th t it i t id tif h lth d t Intro Reid or that current de-identification


  1. Does De-identification Work ? Khaled El Emam, CHEO RI & uOttawa Key Points Progress • The evidence that it is easy to re-identify health data Th id th t it i t id tif h lth d t Intro Reid or that current de-identification methods do not work is quite weak • There are powerful de-identification techniques in use today that can provide strong guarantees and are defensible under existing standards Risk • It is very difficult to re-identify data that has been properly de identified properly de-identified • We have a “poor de-identification” problem Deid • There are defensible de-identification methods that retain data utility End Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca 1

  2. Broad Claims Progress • The “easy re-identification” hypothesis Th “ id tifi ti ” h th i Intro Reid • This has had some impact on policy makers • Such claims need to be examined in a systematic way because the implications are very serious: Risk – May make it necessary to obtain patient consent every time data is used for secondary purposes – Discourages de-identification, and therefore more Deid identifiable information will be used and disclosed – The likelihood of reportable data breaches would increase leading to erosion of patient trust End Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca Examples Often Used Progress Weld Weld Weld Weld AOL AOL AOL AOL Illinois Illinois Illi Illi i i Netflix Netflix N tfli N tfli CBC CBC CBC CBC Intro Reid Governor Weld of MA  Insurance claims data was matched with voter registration list  Both databases had full date of birth, ZIP5, and gender Risk  Both databases were publicly available for free or a nominal fee  The claims belonging to the governor were re-identified Deid End Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca 2

  3. Examples Often Used Progress AOL AOL AOL AOL Weld Weld W ld W ld Illinois Illinois Illi Illi i i N tfli Netflix Netflix N tfli CBC CBC CBC CBC Intro Reid AOL Search Queries  AOL researcher made search queries available to the research community on-line  USERIDs were replaced with persistent pseudonyms Risk  NYT reporters were able to re-identify Thelma Arnold based on her search Thelma Arnold based on her search queries  There are various other Deid unsubstantiated claims of other people being re-identified End Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca Examples Often Used Progress Illinois Illinois Illinois Illinois Weld Weld Weld Weld AOL AOL AOL AOL Netflix Netflix Netflix Netflix CBC CBC CBC CBC Intro Reid Neuroblastoma Registry  Newspaper made an access request for the cancer registry (rare and in children)  Public health unit argued that this is identifiable information Risk  They went to court, all the way to the y y state supreme court  An expert witness was apparently able Deid to re-identify most of the records in cancer registry End Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca 3

  4. Examples Often Used Progress Netflix Netflix Netflix Netflix W ld Weld Weld W ld AOL AOL AOL AOL Illinois Illi Illi Illinois i i CBC CBC CBC CBC Intro Reid Movie Ratings Data Competition  Netflix holds a $1m competition for a recommendation system that is better than theirs  A large data set of movie ratings is made publicly available for the entrants Risk  Researchers claim to have re-identified Researchers claim to have re identified individuals in the data set  Netflix cancels a second competition Deid End Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca Examples Often Used Progress CBC CBC CBC CBC Weld Weld Weld Weld AOL AOL AOL AOL Illinois Illinois Illinois Illinois Netflix Netflix Netflix Netflix Intro Reid Adverse Drug Event Database  CBC obtained ADE database through an access request  Health Canada claims that CBC matched DB with obituaries to re- identify a record, and broadcast that in Risk a program  CBC asked for more data – Health Canada reduced the details Deid  They went to federal court End Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca 4

  5. Our Objectives Progress • We wanted to examine the empirical evidence W t d t i th i i l id Intro Reid of re-identification attacks on health information through a systematic review • The objectives were: – characterize known re-identification attacks on health data and contrast that to re-identification Risk attacks on other kinds of data attacks on other kinds of data, – compute the overall proportion of records that have Deid been correctly re-identified in these attacks, and – assess whether these demonstrate weaknesses in current de-identification methods End Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca 5

  6. Methodological Considerations Progress • S Systematic reviews and meta analysis are a very t ti i d t l i Intro Reid common way for combining evidence across multiple studies – it has been in use for many decades in multiple disciplines • The methodology is quite standardized and addresses issues around: publication bias and heterogeneity in studies Risk • • It is well known that single studies are unreliable It is well known that single studies are unreliable – practice recommendations should come from Deid systematic reviews • Computer scientists and lawyers do not get a free pass End Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca PRI MSA Progress Intro Reid Risk Deid End Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca 6

  7. Main Observations - I Progress • Up to October 2010 there were 14 U t O t b 2010 th 14 Intro Reid published re-identification attacks on health and non-health data • Only 10 described the methodology used in the attack Risk • 6/ 14 were on health data / h l h d Deid • 11/ 14 were conducted by researchers as demonstration attacks End • Most attacks were conducted in the US Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca Main Observations - I I Progress • 2/ 14 attacks followed existing 2/ 14 tt k f ll d i ti Intro Reid standards; most of the attacked data was not de-identified in a defensible way • Only study that followed existing Risk standards had a success rate of standards had a success rate of 0.00013 (ONC study) Deid • All attacks are identity disclosure End Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca 7

  8. HI PAA Safe Harbor Progress Intro Safe Harbor Direct Identifiers and Quasi-identifiers Reid 1. Names 12.Vehicle identifiers 18.Any other unique 2. ZIP Codes (except and serial numbers, identifying number, first three) including license characteristic, or 3. All elements of dates plate numbers code (except year) 13.Device identifiers 4. Telephone numbers and serial numbers 5. Fax numbers 14.Web Universal 6. Electronic mail Resource Locators Risk addresses (URLs) 7. Social security 7. Social security 15.Internet Protocol (IP) 15.Internet Protocol (IP) numbers address numbers 8. Medical record 16.Biometric identifiers, Deid numbers including finger and 9. Health plan voice prints beneficiary numbers 17.Full face 10.Account numbers photographic images 11.Certificate/license and any comparable numbers images; End Electronic Health Information Laboratory, CHEO Research Institute, 401 Smyth Road, Ottawa K1H 8L1, Ontario; www.ehealthinformation.ca 8

Recommend


More recommend