Privacy through Accountability: A Computer Science Perspective Anupam Datta Associate Professor Computer Science, ECE, CyLab Carnegie Mellon University February 2014
Personal Information is Everywhere 2
Research Challenge Programs and People Ensure organizations respect privacy expectations in the collection, use, and disclosure of personal information 3
Web Privacy Example privacy policies: Not use detailed location (full IP address) for advertising Not use race for advertising 4
Healthcare Privacy Auditor Hospital Patient Patient Patient informatio informatio information n n Drug Company Patient Nurse Physician Example privacy policies: Use patient health info only for treatment, payment Share patient health info with police if suspect crime 5
A Research Area Formalize Privacy Policies Precise semantics of privacy concepts (restrictions on personal information flow) Enforce Privacy Policies Audit and Accountability Detect violations Blame-assignment Adaptive audit resource allocation Related ideas: Barth et al Oakland 2006; May et al CSFW 2006; Weitzner et al CACM 2008, Lampson 2004 6
Today: Focus on Detection Healthcare Privacy Play in two acts Web Privacy Play in two (brief) acts 7
Example from HIPAA Privacy Rule A covered entity may disclose an individual’s protected health information (phi) to law-enforcement officials for the purpose of identifying an individual if the individual made a statement admitting participating in a violent crime that the covered entity believes may have caused serious physical harm to the victim Concepts in privacy policies Black-and- Actions: send(p1, p2, m) white concepts Roles: inrole(p2, law-enforcement) Data attributes: attr_in(prescription, phi) Temporal constraints: in-the-past(state(q, m)) Grey concepts Purposes: purp_in(u, id-criminal)) Beliefs: believes-crime-caused-serious-harm(p, q, m) 8
Detecting Privacy Violations Automated audit for The Oracle black-and- white policy concepts Organizational The Matrix character Privacy Policy audit log Species Computer Program Complete formalization Detect Title A program designed to of HIPAA Privacy Rule, policy investigate the human GLBA violation psyche. s Oracles to audit for grey policy concepts Computer-readable privacy policy Audit 9
Policy Auditing over Incomplete Logs With D. Garg (CMU MPI-SWS) and L. Jia (CMU) 2011 ACM Conference on Computer and Communications Security 10
Key Challenge for Auditing Audit Logs are Incomplete Future: store only past and current events Example: Timely data breach notification refers to future event Subjective: no “grey” information Example: May not record evidence for purposes and beliefs Spatial: remote logs may be inaccessible Example: Logs distributed across different departments of a hospital 11
Abstract Model of Incomplete Logs Model all incomplete logs uniformly as 3 -valued structures Define semantics (meanings of formulas) over 3-valued structures 12
reduce: The Iterative Algorithm reduce ( L , φ ) = φ' Logs r r e e d d u u c c φ 0 φ 1 φ 2 e e Policy Time 13
Syntax of Policy Logic First-order logic with restricted quantification over infinite domains (challenge for reduce) Can express timed temporal properties, “grey” predicates 14
Example from HIPAA Privacy Rule A covered entity may disclose an individual’s protected health information (phi) to law-enforcement officials for the purpose of identifying an individual if the individual made a statement admitting participating in a violent crime that the covered entity believes may have caused serious physical harm to the victim ∀ p1, p2, m, u, q, t. (send(p1, p2, m) ∧ inrole(p2, law-enforcement) ∧ tagged(m, q, t, u) ∧ attr_in(t, phi)) ⊃ (purp_in(u, id-criminal)) ∧ ∃ m’. state(q,m’) ∧ is-admission-of-crime(m’) ∧ believes-crime-caused-serious-harm(p1, q, m’) 15 15
reduce: Formal Definition General Theorem: If initial policy passes a syntactic mode check , then finite substitutions can be computed c is a formula for Applications: The entire HIPAA and GLBA which finite satisfying substitutions of x can Privacy Rules pass this check be computed 16
Example φ = { p1 → UPMC, ∀ p1, p2, m, u, q, t. p2 → allegeny-police, (send(p1, p2, m) ∧ m → M2, q → Bob, tagged(m, q, t, u) ∧ u → id-bank-robber, attr_in(t, phi)) t → date-of-treatment ⊃ inrole(p2, law-enforcement) } ∧ purp_in(u, id-criminal) ∧ ∃ m’. ( state(q, m’) { m’ → M1 } ∧ is-admission-of-crime(m’) ∧ believes-crime-caused-serious-harm(p1, m’)) Log φ ' = T Jan 1, 2011 state(Bob, M1) ∧ purp_in(id-bank-robber, id-criminal) ∧ is-admission-of-crime(M1) Jan 5, 2011 ∧ believes-crime-caused-serious-harm(UPMC, M1) send(UPMC, allegeny-police, M2) tagged(M2, Bob, date-of-treatment, id-bank-robber) 17
Implementation and Case Study Implementation and evaluation over simulated audit logs for compliance with all 84 disclosure-related clauses of HIPAA Privacy Rule Performance: Average time for checking compliance of each disclosure of protected health information is 0.12s for a 15MB log Mechanical enforcement: reduce can automatically check 80% of all the atomic predicates 18
Ongoing Transition Efforts Integration of reduce algorithm into Illinois Health Information Exchange prototype Joint work with UIUC and Illinois HLN Auditing logs for policy compliance Ongoing conversations with Symantec Research 19
Related Work Distinguishing characteristics General treatment of incompleteness in audit logs 1. Quantification over infinite domains (e.g., messages) 2. First complete formalization of HIPAA Privacy Rule and 3. GLBA. Nearest neighbors Basin et al 2010 (missing 1, weaker 2, cannot handle 3) Lam et al 2010 (missing 1, weaker 2, cannot handle entire 3) Weitzner et al (missing 1, cannot handle 3) Barth et al 2006 (missing 1, weaker 2, did not do 3) 20
Formalizing and Enforcing Purpose Restrictions With M. C. Tschantz (CMU Berkeley) and J. M. Wing (CMU MSR) 2012 IEEE Symposium on Security & Privacy 21
Goal Give a semantics to “Not for” purpose restrictions “Only for” purpose restrictions that is parametric in the purpose Provide audit algorithm for detecting violations for that semantics 22
No diagnosis X-ray taken Send record by drug company Add x-ray Med records used only for diagnosis Medical Record Diagnosis X-ray added Send record by specialist 23
No diagnosis by X-ray taken Send record drug company Not achieve Add x-ray purpose Achieve purpose Diagnosis X-ray added by specialist Send record 24
No diagnosis X-ray taken Send record (by drug co. or specialist) Choice point Add x-ray Specialist Best choice fails 1/4 Diagnosis Send X-ray added by specialist record 3/4 25
Planning Thesis: An action is for a purpose iff that action is part of a plan for furthering the purpose i.e., always makes the best choice for furthering the purpose 26
Auditing Purpose Obeyed restriction Auditee’s Inconclusiv behavior e Violated Decision- making model 27
Policy Record only Violated implications for treatment No [ , send Actions record] optimal? MDP Optimal Solve actions for r each state 28
Summary: A Sense of Purpose Thesis: An action is for a purpose iff that action is part of a plan for furthering the purpose i.e., always makes the best choice for furthering the purpose Audit algorithm detects policy violations by checking if observed behavior could have been produced by optimal plan 29
Today: Focus on Detection Healthcare Privacy Play in two acts Web Privacy Play in two (brief) acts 30
Bootstrapping Privacy Compliance in a Big Data System With S. Sen (CMU) and S. Guha, S. Rajamani, J. Tsai, J. M. Wing (MSR) 2014 IEEE Symposium on Security & Privacy 31
Privacy Compliance for Bing Setting: Auditor has access to source code 32
Two Central Challenges Ambiguous privacy 1. policy Legal Team Meaning unclear Crafts Policy Meeting s Privacy Huge undocumented 2. Champion codebases & Interprets Policy Meeting datasets s Developer Connection to policy Writes Code unclear Meeting s Audit Team Verifies Compliance 33
1. Legalease Example: Clean syntax Layered allow-deny information flow rules DENY Datatype IPAddress with exceptions USE FOR PURPOSE Precise Semantics Advertising No ambiguity EXCEPT Focus on Usability ALLOW Datatype IPAddress: User study of Truncated Legalease with Microsoft privacy champions promising 34
Dataset Dataset IPAddres Dataset Dataset IDX 2. Grok Name Age A B D s G Data Inventory Process Process GeoIP NewAcct Annotate code + 4 1 data with policy data types Dataset Dataset Source labels Hash Country IDX Dataset I C H propagated via data flow graph Process Process Check Login Different Noisy 2 5 Fraud Sources Variable Name Dataset Dataset Timestam Dataset J IDX Analysis Hash p E F Developer Annotations Reportin Process Check Process g 6 Hijack 3 35
Recommend
More recommend