Security and Data Privacy Instructor: Pratiksha Thaker cs245.stanford.edu
Outline Security requirements Key concepts and tools Differential privacy Other security tools CS 245 2
Outline Security requirements Key concepts and tools Differential privacy Other security tools CS 245 3
Why Security & Privacy? CS 245 4
Why Security & Privacy? Data is valuable & can cause harm if released » Example: medical records, purchase history, internal company documents, etc Data releases can’t usually be “undone” Security policies can be complex » Each user can only see data from their friends » Analyst can only query aggregate data » Users can ask to delete their derived data CS 245 5
Why Security & Privacy? It’s the law! New regulations about user data: US HIPAA: Health Insurance Portability & Accountability Act (1996) » Mandatory encryption, access control, training EU GDPR, CA CCPA: (2018) » Users can ask to see & delete their data PCI DSS: Payment Card Industry standard (2004) » Required in contracts with MasterCard, etc CS 245 6
Consequence Security and privacy must be baked into the design of data-intensive systems » Often a key differentiator for products! CS 245 7
The Good News Declarative interface to many data-intensive systems can enable powerful security features » One of the “big ideas” in our class! Example: System R’s access control on views read arbitrary write SQL query SQL Tables View Users CS 245 8
Outline Security requirements Key concepts and tools Differential privacy Other security tools CS 245 9
Some Security Goals Access Control: only the “right” users can perform various operations; typically relies on: » Authentication: a way to verify user identity (e.g. password) » Authorization: a way to specify what users may take what actions (e.g. file permissions) Auditing: system records an incorruptible audit trail of who did each action CS 245 10
Some Security Goals Confidentiality: data is inaccessible to external parties (often via cryptography) Integrity: data can’t be modified by external parties Privacy: only a limited amount of information about “individual” users can be learned CS 245 11
Clarifying These Goals Say our goal was access control : only Matei can set CS 245 student grades on Axess What scenarios should Axess protect against? 12
Clarifying These Goals Say our goal was access control : only Matei can set CS 245 student grades on Axess What scenarios should Axess protect against? 1. Bobby T. (an evil student) logging into Axess as himself and being able to change grades 13
Clarifying These Goals Say our goal was access control : only Matei can set CS 245 student grades on Axess What scenarios should Axess protect against? 1. Bobby T. (an evil student) logging into Axess as himself and being able to change grades 2. Bobby sending hand-crafted network packets to Axess to change his grades 14
Clarifying These Goals Say our goal was access control : only Matei can set CS 245 student grades on Axess What scenarios should Axess protect against? 1. Bobby T. (an evil student) logging into Axess as himself and being able to change grades 2. Bobby sending hand-crafted network packets to Axess to change his grades 3. Bobby getting a job as a DB admin at Axess 15
Clarifying These Goals Say our goal was access control : only Matei can set CS 245 student grades on Axess What scenarios should Axess protect against? 1. Bobby T. (an evil student) logging into Axess as himself and being able to change grades 2. Bobby sending hand-crafted network packets to Axess to change his grades 3. Bobby getting a job as a DB admin at Axess 4. Bobby guessing Matei’s password 16
Clarifying These Goals Say our goal was access control : only Matei can set CS 245 student grades on Axess What scenarios should Axess protect against? 1. Bobby T. (an evil student) logging into Axess as himself and being able to change grades 2. Bobby sending hand-crafted network packets to Axess to change his grades 3. Bobby getting a job as a DB admin at Axess 4. Bobby guessing Matei’s password 5. Bobby blackmailing Matei to change his grade 17
Clarifying These Goals Say our goal was access control : only Matei can set CS 245 student grades on Axess What scenarios should Axess protect against? 1. Bobby T. (an evil student) logging into Axess as himself and being able to change grades 2. Bobby sending hand-crafted network packets to Axess to change his grades 3. Bobby getting a job as a DB admin at Axess 4. Bobby guessing Matei’s password 5. Bobby blackmailing Matei to change his grade 6. Bobby discovering a flaw in AES to do #2 18
Threat Models To meaningfully reason about security, need a threat model : what adversaries may do » Same idea as failure models! For example, in our Axess scenario, assume: » Adversaries only interact with Axess through its public API » No crypto algorithm or software bugs » No password theft Implementing complex security policies can be hard even with these assumptions! CS 245 19
Threat Models No useful threat model can cover everything » Goal is to cover the most feasible scenarios for adversaries to increase the cost of attacks Threat models also let us divide security tasks across different components » E.g. auth system handles passwords, 2FA CS 245 20
Threat Models CS 245 Source: XKCD.com 21
Useful Building Blocks Encryption: encode data so that only parties with a key can efficiently decrypt Cryptographic hash functions: hard to find items with a given hash (or collisions) Secure channels (e.g. TLS): confidential, authenticated communication for 2 parties CS 245 22
Security Tools from DBMSs First-class concept of users + access control » Views as in System R, tables, etc Secure channels for network communication Audit logs for analysis Encrypt data on-disk (perhaps at OS level) CS 245 23
Modern Tools for Security Privacy metrics and enforcement thereof (e.g. differential privacy) Computing on encrypted data (e.g. CryptDB) Hardware-assisted security (e.g. enclaves) Multi-party computation (e.g. secret sharing) CS 245 24
Outline Security requirements Key concepts and tools Differential privacy Other security tools CS 245 25
Threat Model queries queries Table with Database private data server Data analysts • Database software is working correctly • Adversaries only access it through public API • Adversaries have limited # of user accounts CS 245 26
Private statistics SELECT AVG(income) FROM professors WHERE state= “ California ” 27
Private statistics Are aggregate statistics more private than individual data? SELECT AVG(income) FROM professors WHERE state= “ California ” SELECT AVG(income) FROM professors WHERE name= “ Matei Zaharia ” 28
Private statistics Are aggregate statistics more private than individual data? No! SELECT AVG(income) FROM professors WHERE state=“ California ” SELECT AVG(income) FROM professors WHERE name=“ Matei Zaharia ” 29
Private statistics 30
Private statistics 31
Idea: differential privacy A contract for algorithms that output statistics 32
Idea: differential privacy A contract for algorithms that output statistics Intuition: the function is differentially private if removing or changing a data point does not change the output "too much" 33
Idea: differential privacy A contract for algorithms that output statistics Intuition: the function is differentially private if removing or changing a data point does not change the output "too much" Intuition: plausible deniability 34
Idea: differential privacy A contract for algorithms that output statistics For A and B that differ in one element , 35
Idea: differential privacy A contract for algorithms that output statistics For A and B that differ in one element , Randomized algorithm that computes statistic 36
Idea: differential privacy A contract for algorithms that output statistics For A and B that differ in one element , Any subset of possible outcomes 37
Idea: differential privacy A contract for algorithms that output statistics For A and B that differ in one element , Privacy parameter Smaller ε ~= more privacy, less accuracy 38
What Does It Mean? Say an adversary runs some query and observes a result Adversary had some set of results, S , that lets them infer something about Matei if Then: ≈ 1+ε Similar outcomes whether or not Matei in DB CS 245 39
What Does It Mean? Private information is noisy. Can we determine anything useful? CS 245 40
What Does It Mean? Private information is noisy. Can we determine anything useful? Query: SELECT COUNT(*) FROM patients WHERE causeOfDeath = ... Assume ε =0.1 CS 245 41
What Does It Mean? CS 245 42
What Does It Mean? CS 245 43
What Does It Mean? With enough base signal, DP can still give useful information! CS 245 44
Side Information Consider the following query: SELECT AVG(income) FROM professors WHERE state=“CA” 45
Recommend
More recommend