Oblivious Coopetitive Analytics Using Hardware Enclaves Ankur Dave , Chester Leung, Raluca Ada Popa, Joseph E. Gonzalez, Ion Stoica (UC Berkeley) EuroSys 2020 April 28, 2020
The need for coopetitive analytics • Analytics can extract value from big data • But datasets often span multiple competing parties
Example: Financial risk assessment “How much subprime debt have all • Banks want to assess banks issued?” systemic risk • This requires cooperation among competing banks • Sharing data creates SELECT SUM (loan_amount) security, regulatory, FROM customer c JOIN loan l ON c.ssn = l.ssn business, and liability WHERE credit_score < 630; concerns
Threat model • Network attacker can see and modify all network traffic but cannot access machines 240 bytes sent from party 2 to party 3 • Malicious party attackers can additionally see and modify computation within their machines + collude if (c.credit_score < 630) { result[c.ssn] += with other parties c.loan_amount }
Approach 1: Cryptography Specialized systems : Conclave, DJoin, private intersection-sum, Prio, UnLynx, MedCo, … • Limited functionality – cannot support rich analytics Generic approaches : SMCQL, AgMPC • Prohibitive overhead
Approach 2: Hardware enclaves • Trusted code runs shielded from OS and processes on the same host Secret data Trusted code • Memory access pattern leakage Enclave Untrusted OS Remote attestation Enclave Enclave
Access pattern leakage Access patterns leak information such as filter selectivity ID Credit score Loan amount 1 720 $2,500 Total loans 2 600 $500 $1,250 2 600 $250 3 600 $500 Memory access SELECT SUM (loan_amount) FROM customer c JOIN loan l ON c.ssn = l.ssn WHERE credit_score < 630;
Oblivious algorithms Oblivious algorithms hide access patterns at a performance cost ID Credit score Loan amount Dummy access 1 720 $2,500 Total loans 2 600 $500 $1,250 2 600 $250 3 600 $500 Memory access SELECT SUM (loan_amount) FROM customer c JOIN loan l ON c.ssn = l.ssn WHERE credit_score < 630;
Previous approaches using hardware enclaves Not oblivious : SCONE, Graphene, Haven, VC3 • Side channel leakage Oblivious : Cipherbase, Opaque • Must maintain remote copy of large datasets; expensive to update • If applied to WAN setting, inefficient due to high-bandwidth shuffles
Oblivious Coopetitive Queries (OCQ) • Designed for oblivious coopetitive analytics • Supports general SQL queries with better performance than previous approaches • Protects against network attacker and malicious party attackers (in the hardware enclave model)
Oblivious Coopetitive Queries (OCQ) Party 1 Federated Execution Jointly Secure Shared OCQ Planner Approved Party 2 Result Federated Plan Queries Party n Parties must agree Authenticated Each party must Replicated across on fixed queries operators on have at least one parties and input data in parties’ own data hardware enclave advance Oblivious operators on joint data
Challenges and Techniques 1. Combining data of mixed sensitivities → Approach: Mixed-sensitivity algorithms 2. Query planning with sensitive cardinalities → Approach: Schema-aware padding 3. Oblivious queries in the wide area → Federated- and security-aware planner
Sensitivity propagation Parties specify sensitivity of each table: Public or Sensitive Propagate sensitivity according to foreign keys and operators ⋈ Region Customer Loan r_zip c_ssn l_id ⋈ r_population c_name l_ssn c_zip l_amount c_credit_score Demographics d_id Foreign key Demographics Region Customer d_zip relationships d_income
Mixed-sensitivity oblivious join Joining Sensitive tables across parties 3.0 produces a mixed-sensitivity join 2.5 Mixed-sensitivity oblivious join 2.0 algorithm: 6SeeduS 1.5 1. Sort Public and Sensitive sides 1.0 separately 0.5 2. Oblivious bitonic merge join 0.0 10 2 10 3 10 4 10 5 10 6 10 7 Up to 2.5x speedup vs. fully-oblivious Join inSut size join for equal-sized tables
Schema-aware padding • Cardinalities are particularly sensitive in the federated setting • Naïve “filter push-up” approaches to padding are very expensive • Find tighter padding bounds using foreign key constraints Region Customer Loan r_zip c_ssn l_id SELECT c_zip, AVG (l_amount / d_income) r_population c_name l_ssn FROM customer c_zip l_amount JOIN loan ON c_ssn = l_ssn c_credit_score JOIN region ON c_zip = r_zip Demographics JOIN demographics ON r_zip = d_zip d_id GROUP BY c_zip Foreign key d_zip relationships d_income
Federated planner Single-Site-Obl : At querier’s Single-Site-Obl enclaves + oblivious algorithms Agg Determines how to run the query Collect to and where to run each operator Data movement Single Site Fed-Obl Agg SELECT SUM (loan_amount) Fed-Obl : Partitioned FROM customer c across enclaves + JOIN loan l ON c.ssn = l.ssn Fed-Obl Mixed-Sensitivity oblivious algorithms WHERE credit_score < 630; Broadcast Join Data movement Broadcast to Fed Fed Filter Fed : Partitioned across all parties’ enclaves Both input tables Sensitive Customer Loan
Evaluation setup • 5 geo-distributed parties • ~10 MB/s bandwidth • Synthetic data, table sizes 4.3 MB–10 GB
OCQ vs. prior work 10 6 CRmRrbLdLty 200000 AsSLrLQ cRuQt 100000 10 5 DJRLQ 41 5uQQLQJ tLme (s) 27000 DJRLQ 45 10 4 3000 10 3 270 230 10 2 74 74 56 39 27 16 10 1 2SDque 2C4 60C4L DJRLQ • Orders of magnitude faster than SMCQL and DJoin due to trusted hardware • Faster than Opaque because OCQ can execute initial filters in plaintext 0 0 0 0
Overhead of OCQ’s security 2utsRurced 2SDque 3ODLQtext federDted 10 3 2C4 5uQQLQJ tLPe (s) 2C4 w/SDddLQJ 270 270 230 190 190 2utsRurced 6SDrk 64/ 10 2 74 74 56 42 39 27 16 12 10 1 7.1 6.4 5.0 3.2 3.0 10 0 CRPRrbLdLty AsSLrLQ cRuQt DJRLQ 41 DJRLQ 45 • 2.2–25x overhead vs. insecure federated or outsourced Spark SQL 0 0
Summary of OCQ’s contributions Efficient, general framework for oblivious coopetitive analytics 1. Mixed-sensitivity oblivious join and aggregation algorithms 2. Schema-aware padding 3. Secure coopetitive query planner
Recommend
More recommend