CS573 Data Privacy and Security Statistical Databases Li Xiong - PowerPoint PPT Presentation

CS573 Data Privacy and Security Statistical Databases Li Xiong Department of Mathematics and Computer Science Emory University

• Statistical databases – Definitions – Early query restriction methods – Output perturbation and differential privacy (next lecture)

Statistical Database • A statistical database is a database which provides statistics on subsets of records • Statistics may be performed to compute SUM, MEAN, MEDIAN, COUNT, MAX AND MIN of records • two types: – pure statistical database: • only stores statistical data. e,.g., a census database. – ordinary database with statistical access • contains individual entries • some users have normal access, others statistical Slide credit: Dr Lawrie Brown (UNSW@ADFA) for “Computer Security: Principles and Practice”, 1/e, by William Stallings and Lawrie Brown, Chapte r 5 “Database Security”.

Statistical Database • Objective: provide statistical users with the aggregate information without compromising the confidentiality of any individual entity represented in the database • Database administrator must prevent, or at least detect, the statistical user who attempts to gain individual information through one or a series of statistical queries • Inference control to prevent inference from statistics to individual records Slide credit: Dr Lawrie Brown (UNSW@ADFA) for “Computer Security: Principles and Practice”, 1/e, by William Stallings and Lawrie Brown, Chapte r 5 “Database Security”.

Statistical Database Security • Statistics are derived from a database by means of a characteristic formula 𝐷 – logical formula over the values of attributes – E.g., C= (Age = 42) & (Sex = Male) & (Employer = ABC) • Query set X 𝐷 of characteristic formula C is the set of records matching 𝐷 • Statistical query is a query that produces a value calculated over a query set • E.g., COUNT(Age=42) Slide credit: Dr Lawrie Brown (UNSW@ADFA) for “Computer Security: Principles and Practice”, 1/e, by William Stallings and Lawrie Brown, Chapte r 5 “Database Security”.

Inference from a Statistical Database • Statistical user is restricted to obtaining only aggregate, or statistical, data from the database and is prohibited access to individual records • Inference problem: – user may infer confidential information about individual entities represented in the SDB – Such an inference is called a compromise • Positive compromise – determine an attribute has a particular value • Negative compromise – determine an attribute does not have a particular value • In some cases, a sequence of queries may reveal information Partial slide credit: Computer Security and Statistical Databases By William Stallings (http://www.informit.com/articles/article.aspx?p=782117)

Inference from a Statistical Database • The inference problem for an SDB can be stated as follows: – A characteristic function C defines a subset of records (rows) within the database – A query using C provides statistics on the selected subset – If the subset is small enough, perhaps even a single record, the questioner may be able to infer characteristics of a single individual or a small group Slide credit: Computer Security and Statistical Databases By William Stallings (http://www.informit.com/articles/article.aspx?p=782117)

Methods  Data perturbation/anonymization  Query restriction  Output perturbation

Data Perturbation User 1 Noise Added Query Results Original Perturbed Database Database Results Query User 2 • Data perturbation introduces noise in the data • Provides answers to all queries, but the answers are approximate

Query Restriction Query 2 Query 1 Original Database Query 2 Query Results Results K K Query 1 Query Results Results • Rejects a query that can lead to a compromise • The answers provided are accurate.

Output Perturbation User 1 Query Results Noise Added to Results Original Database Results Query User 2 • perturbs the answer to user queries while leaving the data in the SDB unchanged • generate statistics that are modified from those that the original database would provide

Methods  Data perturbation/anonymization  Query restriction  Query set size control  Query set overlap control  Query auditing  Output perturbation

Query Set Size Control  Simplest form of query restriction  A query-set size control limit the number of records that must be in the result set  Query 𝑟 𝐷 is permitted (allows the query results to be displayed) only if the number of records that match 𝐷 satisfies the condition K ≤ 𝑌 𝐷 ≤ L – K where 𝑀 is the size of the database and 𝐿 is a 𝑀 2 parameter that satisfies 0 ≤ 𝐿 ≤ Τ

Query Set Size Control Query 2 Query 1 Original Database Query 2 Query Results Results K K Query 1 Query Results Results

Tracker • Q1: Count ( Sex = Female ) = A • Q2: Count ( Sex = Female OR (Age = 42 & Sex = Male & Employer = ABC) ) = B What if B = A+1?

Tracker • Q1: Count ( Sex = Female ) = A • Q2: Count ( Sex = Female OR (Age = 42 & Sex = Male & Employer = ABC) ) = B If B = A+1 • Q3: Count ( Sex = Female OR (Age = 42 & Sex = Male & Employer = ABC) & Diagnosis = Schizophrenia) • if response to Q3 is B • if response to Q3 is A Positively or negatively compromised!

Query set size control • If the threshold value 𝐿 is large, then it will restrict too many queries – And still does not guarantee protection from compromise • The database can be easily compromised within a frame of 4-5 queries

Query Set Overlap Control • Basic idea: successive queries must be checked against the number of common records. • If the number of common records in any query exceeds a given threshold, the requested statistic is not released. • A query 𝑟 𝐷 is only allowed if the number of records that match 𝐷 satisfies: 𝑌 𝐷 ∩ 𝑌 𝐸 ≤ 𝑠, 𝑠 > 0 for all 𝑟 𝐸 that have been answered for this user, and where 𝑠 is a fixed integer greater than 0

Query-set-overlap control • Statistics for a set and its subset cannot be released – limiting usefulness • High processing overhead – every new query compared with all previous ones • Multiple users - need to keep user profile, need to consider collusion between users • Still no formal privacy guarantee

Auditing • Keeping up-to-date logs of all queries made by each user and check for possible compromise when a new query is issued • Excessive computation and storage requirements • Only “efficient” methods for special types of queries

Audit Expert (Chin 1982) • Query auditing method for SUM queries • Given sensitive values 𝑦 1 , … , 𝑦 𝑀 , any SUM query on those values can be modeled as an equation q = 𝑏 1 𝑦 1 + 𝑏 2 𝑦 2 … + 𝑦 𝑀 𝑦 𝑀 • where 𝑏 𝑗 = 1 if 𝑦 𝑗 (record 𝑗 ) belongs to the query set and 𝑏 𝑗 = 0 otherwise, and q is the query result • A set of 𝑛 SUM queries can be thought of as a system of linear equations 𝐵𝑌 = 𝐸 where 𝐵 is an 𝑛 × 𝑀 binary matrix, 𝑌 is the vector of sensitive values, and 𝐸 is the vector of query result • Maintains the binary matrix representing linearly independent queries and update it when a new query is issued A row with all 0s except for 𝑗 𝑢ℎ column indicates disclosure •

Audit Expert • 𝑃 𝑀 2 time complexity • Further work reduced to 𝑃(𝑀) time and space when number of queries < 𝑀 • Only for SUM queries

Auditing – recent developments • Online auditing – “Detect and deny” queries that violate privacy requirement – Denial themselves may implicitly disclose sensitive information – Prevents privacy breaches on-the-fly • Offline auditing – Check if a privacy requirement has been violated after the queries have been executed – Not to prevent - objective to check for compliance of privacy requirement

Methods  Data perturbation/anonymization  Query restriction  Output perturbation  Differential privacy

CS573 Data Privacy and Security Statistical Databases Li Xiong - PowerPoint PPT Presentation

CS573 Data Privacy and Security Statistical Databases Li Xiong Department of Mathematics and Computer Science Emory University Statistical databases Definitions Early query restriction methods Output perturbation and differential

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

CS573 Data Privacy and Security Local Differential Privacy Li Xiong Privacy at Scale: Local

CS573 Data Privacy and Security Differential Privacy Real World Deployments Li Xiong

CS573 Data Privacy and Security Location Privacy Location Privacy Yonghui (Yohu) Xiao htt //

Healthcare privacy and security Li Xiong CS573 Data Privacy and Security Patients Are Concerned

CS573 Data Privacy and Security Li Xiong Department of Mathematics and Computer Science Emory

CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong

Data Privacy Anonymization Li Xiong CS573 Data Privacy and Security Outline Inference

Privacy-Preserving Query Processing over Encrypted Data in Cloud CS573 Data Privacy and Security

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques

CS573 Data Privacy and Security Secure Multiparty Computation Problem and security definitions

Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security

CS573 Data Privacy and Security Statistical Databases Statistical Databases Li Xiong Today

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

CS 4410 Spring 2016 Prelim 2 April 28, 2016 Full Name: NetID: Problem Points Score 1 10 2

NuMI Horn Longitudinal Field-Mapping System Adam Watts TSD Topical Meeting 19 September 2019

Making Impact an November 2016 Investor Meetings Cautionary Statements Use of Non-GAAP

C R E D I T M A R K E T S MPA 612: Economy, Society, and Public Policy March 11, 2019 Fill out

WELCOME TO #WCETWEBCAST October 2, 2018 The webcast will begin shortly. There is no audio being

Solar in Your Community Challenge January 10, 2016 Housekeeping 2 About CESA 3 Sustainable

LA Application Process Fall 2016 John Hatzell Student Director, LA Program 2 Why be a

The CARES Act Emergency Support for Small to Midsized Businesses & Nonprofits A Few

Sambuz

Useful Links

Newsletter

Mail Us

CS573 Data Privacy and Security Statistical Databases Li Xiong - PowerPoint PPT Presentation

CS573 Data Privacy and Security Statistical Databases Li Xiong Department of Mathematics and Computer Science Emory University Statistical databases Definitions Early query restriction methods Output perturbation and differential

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

CS573 Data Privacy and Security Local Differential Privacy Li Xiong Privacy at Scale: Local

CS573 Data Privacy and Security Differential Privacy Real World Deployments Li Xiong

CS573 Data Privacy and Security Location Privacy Location Privacy Yonghui (Yohu) Xiao htt //

Healthcare privacy and security Li Xiong CS573 Data Privacy and Security Patients Are Concerned

CS573 Data Privacy and Security Li Xiong Department of Mathematics and Computer Science Emory

CS573 Data Privacy and Security Differential Privacy tabular data and range queries Li Xiong

Data Privacy Anonymization Li Xiong CS573 Data Privacy and Security Outline Inference

Privacy-Preserving Query Processing over Encrypted Data in Cloud CS573 Data Privacy and Security

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques

CS573 Data Privacy and Security Secure Multiparty Computation Problem and security definitions

Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security

CS573 Data Privacy and Security Statistical Databases Statistical Databases Li Xiong Today

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

CS 4410 Spring 2016 Prelim 2 April 28, 2016 Full Name: NetID: Problem Points Score 1 10 2

NuMI Horn Longitudinal Field-Mapping System Adam Watts TSD Topical Meeting 19 September 2019

Making Impact an November 2016 Investor Meetings Cautionary Statements Use of Non-GAAP

C R E D I T M A R K E T S MPA 612: Economy, Society, and Public Policy March 11, 2019 Fill out

WELCOME TO #WCETWEBCAST October 2, 2018 The webcast will begin shortly. There is no audio being

Solar in Your Community Challenge January 10, 2016 Housekeeping 2 About CESA 3 Sustainable

LA Application Process Fall 2016 John Hatzell Student Director, LA Program 2 Why be a

The CARES Act Emergency Support for Small to Midsized Businesses &amp; Nonprofits A Few

Sambuz

Useful Links

Newsletter

Mail Us

The CARES Act Emergency Support for Small to Midsized Businesses & Nonprofits A Few