De-Identifying Education Data Future of Privacy Forum Webinar October 13, 2017 MICHAEL HAWES Director of Student Privacy Policy U.S. Department of Education United States Department of Education 2 Student Privacy Policy and Assistance Division
What is PII? 2 2 Student Privacy Policy and Assistance Division
Personally Identifiable Information Captain Hook 2 3 Student Privacy Policy and Assistance Division
Personally Identifiable Information A one-handed pirate, with an irrational fear of crocodiles and ticking clocks 2 4 Student Privacy Policy and Assistance Division
FERPA: Personally Identifiable Information (PII) • Direct Identifiers • e.g., Name, SSN, Student ID Number, etc. (1:1 relationship to student) • Indirect Identifiers • e.g., Birthdate, Demographic Information (1:Many relationship to student) • “ Other information that, alone or in combination, is linked or linkable to a specific student that would allow a reasonable person in the school community, who does not have personal knowledge of the relevant circumstances, to identify the student with reasonable certainty. ” (§ 99.3) 5 2 5 Student Privacy Policy and Assistance Division
FERPA’s Confidentiality Standard Can a “reasonable person” in the school community re-identify the individual with any reasonable certainty? Tabular Data: A small degree of uncertainty (“reasonable doubt”) is often sufficient. [e.g., “the rule of 3”] Individual-level Data: The abundance of data points for each individual, the availability of easy to use data-manipulation and data mining tools, and the ability to link to external data sources make the risk of re- identification much higher. 6 2 6 Student Privacy Policy and Assistance Division
FERPA vs. HIPAA’s “Safe Harbor” 2 7 Student Privacy Policy and Assistance Division
PII? But I’m only releasing aggregate data … Aggregate data tables can still contain PII if they report information on small groups, or individuals with unique or uncommon characteristics 8 2 8 Student Privacy Policy and Assistance Division
How States are Doing It Under ESEA, States adopted minimum n-size rules to protect student privacy in aggregate reports, but there is substantial variation across states on the minimum n-size selected. State Adoption of Sub-Group Suppression Rules 35 (circa 2012) 30 25 20 15 # of States 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Suppression Rule (n < X) There is also substantial variation in how States have interpreted and implemented those minimum n-size requirements. 2 9 Student Privacy Policy and Assistance Division
Small cells increase disclosure risk… BUT, suppressing the small cells may not be sufficient 10 2 10 Student Privacy Policy and Assistance Division
Common Mistakes in Public Reporting 2 11 Student Privacy Policy and Assistance Division
Population Size vs. Cell Size Assume a minimum n-size rule of 5: Subgroup # Tested # Proficient % Proficient Subgroup 1 6 1 16.7% 2 12 Student Privacy Policy and Assistance Division
Population Size vs. Cell Size Assume a minimum n-size rule of 5: Subgroup # Tested # Proficient % Proficient Subgroup 1 6 1 16.7% What if I’m that 1 student? I now know something about the other 5! 2 13 Student Privacy Policy and Assistance Division
Fixed Top/Bottom Coding Thresholds Assume a minimum n-size rule of 5: Subgroup # Tested # Proficient % Proficient Subgroup 1 8 * <5% 2 14 Student Privacy Policy and Assistance Division
Fixed Top/Bottom Coding Thresholds Assume a minimum n-size rule of 5: Subgroup # Tested # Proficient % Proficient Subgroup 1 8 * <5% 0/8 = 0% 1/8 = 12.5% So, “<5%” of 8 students = 0 students! 2 15 Student Privacy Policy and Assistance Division
A Better Approach for Handling Extreme Values Number of Top/Bottom Coding for Students Percentages (denominator) 1-5 Suppressed 6-15 <50%, ≥50% 16-30 ≤20%, ≥80% 31-60 ≤10%, ≥90% 61-300 ≤5%, ≥95% 301-3,000 ≤1%, ≥99% 3,001 or more ≤0.1%, ≥99.9% 2 16 Student Privacy Policy and Assistance Division
What’s the missing number? 12 8 14 ? 6 2 17 Student Privacy Policy and Assistance Division
What’s the missing number? 12 8 14 ? 6 44 2 18 Student Privacy Policy and Assistance Division
What’s the missing number? 12 8 14 4 6 44 2 19 Student Privacy Policy and Assistance Division
What’s the missing number? 12 8 14 CENSORED 6 44 CENSORED 2 20 Student Privacy Policy and Assistance Division
What’s the missing number? Students by Subgroup 12 8 Students by 14 Gender 20 4 24 6 44 2 21 Student Privacy Policy and Assistance Division
Lack of Complementary Suppression Below Subgroup # Tested Advanced Proficient Basic Basic Subgroup 1 11 0% 45% 36% 18 Subgroup 2 1 * * * * All Students 12 0% 42% 42% 17% 2 22 Student Privacy Policy and Assistance Division
Lack of Complementary Suppression Below Subgroup # Tested Advanced Proficient Basic Basic Subgroup 1 11 0% 45% 36% 18 Subgroup 2 1 * * * * All Students 12 0% 42% 42% 17% 2 23 Student Privacy Policy and Assistance Division
Lack of Complementary Suppression Below Subgroup # Tested Advanced Proficient Basic Basic Subgroup 1 11 0% 45% 36% 18 Subgroup 2 1 * * 100% * All Students 12 0% 42% 42% 17% 2 24 Student Privacy Policy and Assistance Division
The Trouble with Cell Size Rules Remember: It’s not just the small cells that are important. Bigger cells/values can still be disclosive if: • they are extreme values (e.g., ~0% or ~100% of students in a group) , or • they can be used to calculate the values of protected cells elsewhere (in the same table, or even in another data release!) 2 25 Student Privacy Policy and Assistance Division
Take Home Point: Consider All Reporting Levels Education data are often reported in a multi- dimensional structure. To be effective, a disclosure avoidance methodology must consider all levels of aggregation. 2 26 Student Privacy Policy and Assistance Division
Lack of Complementary Suppression Below Subgroup # Tested Advanced Proficient Basic Basic Subgroup 1 11 0% 45% 36% 18 Subgroup 2 1 * * 100% * All Students 12 0% 42% 42% 17% 2 27 Student Privacy Policy and Assistance Division
Complementary Suppression When using suppression to protect privacy, consider all the ways that data are aggregated. Subgroup Grade School District State national If a cell is suppressed (primary or complementary) at one level, it needs to be suppressed in at least one other reporting entity at the next level of aggregation. And make sure that those additional entities have proper complementary suppression too! 2 28 Student Privacy Policy and Assistance Division
Take Home Point: Data Releases by Others When performing a disclosure risk analysis, you must consider data releases made by other organizations. How schools, districts, states, and the Federal government release the same (or related) data, may impact the re-identifiability of the data you (or they) release! 2 29 Student Privacy Policy and Assistance Division
So What Are Your Options? The 3 “Flavors” of Disclosure Avoidance Techniques: • Suppression • “Blurring” • Perturbation 2 30 Student Privacy Policy and Assistance Division
Suppression Removing data to prevent the identification of individuals Definition: in small cells or with unique characteristics Cell Suppression Examples: Row Suppression Sampling Effect on Data Utility: Results in very little data being produced for small populations Requires suppression of additional, non-sensitive data (e.g., complementary suppression) Suppression can be difficult to perform correctly Residual Risk of (especially for large multi-dimensional tables) Disclosure: If additional data is available elsewhere, the suppressed data may be re-calculated. 2 31 Student Privacy Policy and Assistance Division
“ Blurring “ Reducing the precision of data that is presented to Definition: reduce the certainty of identification Aggregation Examples: Percents Ranges Top/Bottom-Coding Rounding Effect on Data Utility: Users cannot make inferences about small changes in the data Reduces the ability to perform time-series or cross- case analysis Generally low risk, but if row/column totals are Residual Risk of published (or available elsewhere) then it may be Disclosure: possible to calculate the actual values of sensitive cells 2 32 Student Privacy Policy and Assistance Division
Recommend
More recommend