Privacy-preserving Information Sharing: Crypto Tools and Applications Emiliano De Cristofaro University College London (UCL) https://emilianodc.com
Privacy-preserving what ? Parties with limited mutual trust willing or required to share information Only the required minimum amount of information should be disclosed in the process 2
Outline 1. Tools for two parties and a case study 2. Some applications 3. Multiple parties 4. Inference from shared information 3
Let’s start with two parties… 4
Secure Computation (2PC) Alice (a) Bob (b) f(a,b) f(a,b) f(a,b) 5
Security? Goldreich to the rescue! Oded Goldreich. Foundations of Cryptography: Basic Applications, Ch. 7.2. Cambridge Univ Press, 2004. Computational Indinguishability Execution in “ideal world” with a trusted third party (TTP) vs Execution in “real world” (crypto protocol) 6
Who are the Adversaries? Outside adversaries? Not considered! Network security “takes care” of that Honest but curious (HbC) “Honest”: follows protocol specifications, do not alter inputs “Curious”: attempt to infer other party’s input Malicious Arbitrary deviations from the protocol Security a bit harder to formalize/prove (need to simulate the ideal world) 7
How to Implement 2PC? 1. Garbled Circuits Sender prepares a garbled circuit and sends it to the receiver, who obliviously evaluates the circuit, learning the encodings corresponding to both her and the sender’s output 2. Special-Purpose Protocols Implement one specific function (and only that?) Usually based on public-key crypto properties (e.g., homomorphic encryption) 8
Privacy-Preserving Information Sharing with 2PC? Alice (a) Bob (b) f(a,b) f(a,b) f(a,b) Map information sharing to f(·,·)? Realize secure f(·,·) efficiently? Quantify information disclosure from output of f(·,·)? 9
A Case Study: Private Set Intersection 10
Private Set Intersection (PSI) Server Client S = { s 1 , , s w } C = { c 1 , , c v } Private Set Intersection S ∩ C 11
Private Set Intersection? DHS (Terrorist Watch List) and Airline (Passenger List) Find out whether any suspect is on a given flight IRS (Tax Evaders) and Swiss Bank (Customers) Discover if tax evaders have accounts at foreign banks Etc. 12
Server Client Straightforward PSI S = { s 1 , , s w } C = { c 1 , , c v } 13
Straightforward PSI? For each item s, the Server sends SHA-256(s) For each item c, the Client computes SHA-256(c) Learn the intersection by matching SHA-256’s outputs What’s the problem with this? 14
Background: Pseudorandom Functions A deterministic function: x → f → f k ( x ) ↑ k Efficient to compute Outputs of the function “look” random 15
Oblivious PRF OPRF x k f k ( x ) f k ( x ) 16
OPRF-based PSI OPRF f k ( x ) Server Client c i S = { s 1 , , s w } 17
OPRF-based PSI OPRF f k ( x ) Server Client c i f k ( c i ) k S = { s 1 , , s w } C = { c 1 , , c v } T i = f k ( c i ) ' = f k ( s j ) ' = f k ( s j ) T j T j Unless s j is in the intersection T j ’ looks random to the client 18
OPRF from Blind-RSA Signatures RSA Signatures: ( N = p ⋅ q , e ), d e ⋅ d ≡ 1mod( p − 1)( q − 1) Sig d ( x ) = H ( x ) d mod N , Ver ( Sig ( x ), x ) = 1 ⇔ Sig ( x ) e = H ( x )mod N PRF: f d ( x ) = H ( sig d ( x )) (H one way function) Server (d) Client (x) 19
OPRF from Blind-RSA Signatures RSA Signatures: ( N = p ⋅ q , e ), d e ⋅ d ≡ 1mod( p − 1)( q − 1) Sig d ( x ) = H ( x ) d mod N , Ver ( Sig ( x ), x ) = 1 ⇔ Sig ( x ) e = H ( x )mod N PRF: f d ( x ) = H ( sig d ( x )) (H one way function) Server (d) Client (x) a = H ( x ) ⋅ r e r ∈ Z N b = a d sig d ( x ) = b / r f d ( x ) = H ( sig d ( x )) ( = H ( x ) d r ed ) 20
Performance Medium Sets, |N|=1024 4000 3500 Total Running Time (ms) 3000 2500 2000 1500 1000 500 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Set Sizes, w=v See: De Cristofaro, Lu, Tsudik, Efficient Techniques for Privacy-preserving Sharing of Sensitive Information, TRUST 2011 21
PSI w/ Data Transfer (PSI-DT) Server Client { } S = C = { c 1 , , c v } ( s , data ),..., ( s w data , ) 1 1 w PSI-DT { } S ∩ C = ( s j , data j ) ∃ c i ∈ C : c i = s j 22
How can we build PSI-DT? 23
PSI w/ Data Transfer Client Server 24
A closer look at PSI Server Client S = { s 1 , , s w } C = { c 1 , , c v } What if the client Private populates C with its best Set Intersection guesses for S? Client needs to prove that inputs satisfy a policy or be authorized Authorizations issued by appropriate authority S ∩ C Authorizations need to be verified implicitly 25
Authorized Private Set Intersection (APSI) Server Client S = { s 1 , , s w } C = {( c 1 , auth ( c 1 )), ,( c v , auth ( c v ))} Authorized Private Set Intersection CA def { } S ∩ C = s j ∈ S ∃ c i ∈ C : c i = s j ∧ auth ( c i ) is valid 26
OPRF w/ Implicit Signature Verification Server Client OPRF with ISV sig ( x ) k f k ( x ) f k ( x ) if Ver ( sig ( x ), x ) = 1 $ otherwise 27
A simple OPRF-like with ISV Sig ( x ) = H ( x ) d mod N Court issues authorizations : OPRF: f k ( x ) = F ( H ( x ) 2 k mod N ) Client (H(x) d ) Server (k) r ∈ Z N d g r a = H ( x ) H ( x ) 2 k = b / g 2 erk b = a 2 ⋅ e ⋅ k ; g k ( b = H ( x ) 2 edk g 2 rek ) f k ( x ) = F ( H ( x ) 2 k ) (Implicit Verification) 28
OPRF with ISV – Malicious Security OPRF: f k ( x ) = F ( H ( x ) 2 k ) Client (H(x) d ) Server (k) d g r α = H ( x )( g ') r r ∈ Z N a = H ( x ) 2 = ( g 2 e / α e / g ') 2 r } π = ZKPK { r : a g k b = a 2 ek 2 ek } π ' = ZKPK { k : b = a H ( x ) 2 k = b / g 2 erk ( b = H ( x ) 2 edk g 2 rek ) f k ( x ) = F ( H ( x ) 2 k ) 29
Proofs in Malicious Model See: De Cristofaro, Kim, Tsudik. Linear-Complexity Private Set Intersection Protocols Secure in Malicious Model Asiacrypt 2010 30
PSI with Garbled Circuits Lots of progress recently! Optimized Circuits Oblivious Transfer Extensions Better techniques to extend to malicious security See: Pinkas et al., Scalable Private Set Intersection Based on OT Extension. ACM TOPS 2018 [More] 31
Quiz! Go to kahoot.it 32
Applications to Genomics 33
From: James Bannon, ARK 34
35
Genome Privacy 1. Genome is treasure trove of sensitive information 2. Genome is the ultimate identifier 3. Genome data cannot be revoked 4. Access to one’s genome ≈ access to relatives’ genomes 5. Sensitivity does not degrade over time See: genomeprivacy.org 36
Genetic Paternity Test A Strawman Approach for Paternity Test: On average, ~99.5% of any two human genomes are identical Parents and children have even more similar genomes Compare candidate’s genome with that of the alleged child: Test positive if percentage of matching nucleotides is > 99.5 + τ First-Attempt Privacy-Preserving Protocol: Use secure computation for the comparison PROs: High-accuracy and error resilience CONs: Performance not promising (3 billion symbols in input) In our experiments, computation takes a few days 37
Genetic Paternity Test Wait a minute! ~99.5% of any two human genomes are identical Why don’t we compare only the remaining 0.5%? We can compare by counting how many But… We don’t know (yet?) where exactly this 0.5% occur! 38
Private RFLP-based Paternity Test Private Set Intersection Cardinality Test Result: (#fragments with same length) 39
Personalized Medicine (PM) Drugs designed for patients’ genetic features Associating drugs with a unique genetic fingerprint Max effectiveness for patients with matching genome Test drug’s “genetic fingerprint” against patient’s genome Examples: tmpt gene – relevant to leukemia (1) G->C mutation in pos. 238 of gene’s c-DNA, or (2) G->A mutation in pos. 460 and one A->G is pos. 419 cause the tpmt disorder (relevant for leukemia patients) hla-B gene – relevant to HIV treatment One G->T mutation (known as hla-B*5701 allelic variant) is associated with extreme sensitivity to abacavir (HIV drug) 40
Reducing P 3 MT to APSI Intuition: FDA acts as CA , Pharmaceutical company as Client , Patient as Server 3 ⋅ 10 9 Patient’s private input set: { } i = 1 G = ( b i || i ) b i ∈ { A , C , G , T } * || j Pharmaceutical company’s input set: fp ( D ) = { ( ) } b j Patient Company * || j * || j { ( ) } ( ) , auth b j * || j ( ) fp ( D ) = b j G = ( b i || i ) { } { ( ) } fp ( D ) = b j APSI 41 Test Result CA
Multiple Parties? 42
Sharing Statistics? Examples: 1. Smart metering 2. Recommender systems for online streaming services 3. Statistics about mass transport movements 4. Traffic statistics for the Tor Network How about privacy? 43
Private Recommendations The BBC keeps 500-1000 free programs on iPlayer No tracking, no ads (taxpayer funded) Valuable to gather statistics, give recommendations “You might also like” E.g., “similar” users have watched both Dr Who and Sherlock Holmes, you have only watched Sherlock, why don’t you watch Dr Who? 44
Recommend
More recommend