Towards Practical Differential Privacy for SQL Queries Noah - PowerPoint PPT Presentation

Towards Practical Differential Privacy for SQL Queries Noah Johnson, Joseph P. Near, Dawn Song UC Berkeley

Outline 1. Discovering real-world requirements 2. Elastic sensitivity & calculating sensitivity of SQL queries 3. Our experience: lessons & challenges

Part 1 Discovering Real-world Requirements

Our collaboration with Uber • Uber’s goal: deploy differential privacy • Internally (for some analysts) • Externally (for partners & regulators) • Our goals • Explore real-world requirements for differential privacy • Build open-source systems

Previous work on differential privacy for analytics: insufficient for real-world applications Previous work: either… • Theoretical (does not explore practical applications) • Targets specialized analytics tasks • Google RAPPOR: browsing statistics • Apple: keyboard & emoji trends Result: little use in real-world analytics environments • No practical, scalable systems for DP in analytics

Empirical study: understanding real-world data analytics • Conducted large-scale empirical study of real-world analytics queries • Dataset: 8 million SQL queries written by data analysts at Uber • Covers wide range of use cases: fraud detection, marketing, business metrics, etc. • Goal: identify DP requirements for real-world workload

Empirical study results The most common aggregations are COUNT , SUM, AVG, MAX, and MIN : 39.3% 40% 30% 22.6% 20% 6.5% 10% 4.6% 3.8% 0.2% 0.1% 0% COUNT SUM AVG MAX MIN MEDIAN STDDEV è Most existing DP mechanisms support only counting queries

Empirical study results 62% of queries use JOIN, and some queries use many joins : 95 53 Joins in query 33 16 0 1 1000 1000000 # queries è Very few existing mechanisms support join

Empirical study results Many different databases in use 6,362,631 1,494,680 1000000 94,206 81,660 39,521 29,387 # queries 1000 1 Vertica Postgres MySQL Hive Presto Other è Existing approaches require modifying/replacing DB

Part 2 Elastic Sensitivity & Analyzing SQL Queries

Global sensitivity vs. local sensitivity for joins Global sensitivity • Unbounded for queries with joins • Single added join key in one table could match an unbounded number of keys in another Local sensitivity • Bounded for queries with joins • Data in true database bounds number of possible new matches • Computationally expensive • Must consider every possible change to true database

Elastic sensitivity Upper bound on local sensitivity • Efficient, compositional calculation from query Supports queries with equijoins • Insight : increase in size of joined relation tightly bounded by multiplicities of join keys • Key multiplicities queried from database in advance Supports more than just count • Works well for COUNT • Works less well for SUM

Example: elastic sensitivity of join SELECT COUNT(*) FROM A JOIN B ON A.k = B.k Duplicate join key 1 causes k v k k v duplicate rows in joined 1 a 1 1 a relation 1 1 a A A JOIN B B Maximum change in COUNT: k v k k v add another 1 to A 1 a 1 1 a 1 b 1 1 a Local sensitivity = 2 1 b A B In general: local sensitivity 1 b bounded by maximum A JOIN B multiplicities of k in A and B

A static analysis framework for SQL queries Built a practical framework for analyzing real-world queries Challenge: these queries are complex Our framework: • Solve complexity once • Enable many different analyses

Differential privacy for SQL queries using Elastic Sensitivity Differentially private Database Sensitive results results Output perturbation Analysis framework Elastic SQL Elastic sensitivity analysis sensitivity Query

Empirical evaluation results Dataset : 9862 Uber queries, run on production database

Part 3 Lessons Learned & Future Challenges

Value of close collaboration • Opportunity to examine real use cases • Dataset of queries: what analysts actually did • Insight into privacy goals in the real world • e.g. concern about external and internal sharing • Discover requirements & infrastructure restrictions • e.g. we really can’t modify the database engine

Challenges of close collaboration • Analysts skeptical about need for privacy protections • Concerned about utility • Believe privacy is already protected • e.g. machine learning teams believe models protect privacy • Privacy team unsure of privacy goals • Belief that de-identification is enough, or • Differential privacy seen as a silver bullet • Would like to “have differential privacy” all in one go • Infrastructure teams want a one-size-fits-all solution • Multiple solutions = more work

Conclusions • Perfect deployment will take time, experimentation • Early versions will be limited • There will be bugs • We can accelerate the process • Encouragement • Constructive engagement • We should encourage transparency • Secrecy encourages bugs, discourages adoption https://github.com/uber/sql-differential-privacy https://arxiv.org/abs/1706.09479 jnear@berkeley.edu Thank you!

Towards Practical Differential Privacy for SQL Queries Noah - PowerPoint PPT Presentation

Towards Practical Differential Privacy for SQL Queries Noah Johnson, Joseph P. Near, Dawn Song UC Berkeley Outline 1. Discovering real-world requirements 2. Elastic sensitivity & calculating sensitivity of SQL queries 3. Our experience:

Basic SQL Lecture 2 1 Outline Data in SQL Simple Queries in SQL Queries with more

Top- -k k Queries Queries on SQL on SQL Databases Databases Top Top-k Queries on SQL

How to run SQL queries on TBs of data using GPUs Jake Wheat Lead Architect, SQream Technologies

CS573 Data Privacy and Security Differential Privacy Real World Deployments Li Xiong

Toniann Pitassi Outline 1. Differential Privacy: The Basics 2. Differential Privacy in New

BASIC SQL CHAPTER 4 (6/E) CHAPTER 8 (5/E) 1 CHAPTER 4 OUTLINE SQL Data Definition and

SQL SQL SQL = Structured Query Language Standard query language for relational

Differential Privacy Techniques Beyond Differential Privacy Steven Wu Assistant Professor

Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques

CS573 Data Privacy and Security Local Differential Privacy Li Xiong Privacy at Scale: Local

Basic SQL Queries 1 Why SQL? SQL is a very-high-level language Say what to do

Basic SQL Queries 1 Why SQL? SQL is a very-high-level language Say what to do

A1 (Part 2): Injection SQL Injection SQL injection is prevalent SQL injection is impactful Why a

What is SQL? SQL stands for Structured Query Language SQL lets you access and manipulate

This Lecture SQL The SQL language SQL, the relational model, and E/R diagrams SQL Data

Intermezzo: A typical database architecture 136 A typical database architecture SQL SQL SQL

Live Video Analytics at Scale with Approximation and Delay-Tolerance Haoyu Zhang, Microsoft and

EXPLOITING LOCALITY IN GRAPH ANALYTICS THROUGH HARDWARE ACCELERATED TRAVERSAL SCHEDULING Anurag

Importing Skill-Biased Technology Ariel Burstein Javier Cravino Jonathan Vogel January 2012

Direct/Adjoint Methods Lecture 12 ME EN 575 Andrew Ning aning@byu.edu Outline Motivating

derivatives for design and control with Jim and Simon review: serial manipulator end

Adaptive Multiscale Streamline Simulation and Inversion for High-Resolution Geomodels Vegard

in the Storage Ring pEDM Experiment ERIC METODIEV CAPP/IBS, HARVARD COLLEGE HAWAII, JOINT

Graphs in Big Data: Challenges and Opportunities Yinglong Xia 05/16/2016 Mission-Critical