Frequency-hiding Dependency-preserving Encryption for Outsourced Databases ICDE’17 Boxiang Dong 1 Wendy Wang 2 1 Montclair State University Montclair, NJ 2 Stevens Institute of Technology Hoboken, NJ April 20, 2017
Data-Management-as-a-Service (DMaS) D Data Owner Server • Data owner with limited computational resources • Computationally powerful server (e.g. cloud) • Outsourcing provides a cost-effective solution for data management. 2 / 47
Functional Dependency (FD) Definition A FD X → Y states that for any records r 1 and r 2 , r 1 [ X ] = r 2 [ X ] demands that r 1 [ Y ] = r 2 [ Y ] . Applications • Data schema improvement via normalization • Data inconsistency repair 3 / 47
Outsourcing Requirement Data Owner Malicious Server Privacy Concern • Protect the sensitive information from untrusted server. • Encrypt the dataset before outsourcing. Utility Concern • Support FD-based applications. • The encryption scheme should preserve FDs. 4 / 47
Challenges Directly applying deterministic encryption (e.g. RSA) is vulnerable against the frequency-analysis attack (FA attack) [N + 15]. FA-Attack ( P , E ) 1. compute π ← vSort ( Hist ( P )) 2. compute ϕ ← vSort ( Hist ( E )) 3. foreach e ∈ E output p if Rank ϕ ( e ) = Rank π ( p ) ID A B C ID A B C ˆ r 1 a 1 ˆ b 1 ˆ c 1 r 1 a 1 b 1 c 1 ˆ r 2 a 1 ˆ b 1 c 2 ˆ r 2 a 1 b 1 c 2 ˆ ˆ ˆ r 3 a 1 b 1 c 4 r 3 a 1 b 1 c 4 ˆ r 4 a 1 b 1 c 3 r 4 a 1 ˆ b 1 ˆ c 3 r 5 a 2 b 2 c 3 ˆ r 5 a 2 ˆ b 2 c 3 ˆ r 6 a 2 b 2 c 4 ˆ r 6 a 2 ˆ b 2 ˆ c 4 (b) ˆ (a) Base table D ( A → B D 1 : deterministic encryption A �→ C , B �→ C ) 5 / 47
Challenges Applying probabilistic encryption may destroy original FDs or introduce false positive FDs. ID A B C ID A B C a 1 b 1 ˆ c 1 a 1 b 1 ˆ c 1 r 1 ˆ ˆ r 1 ˆ ˆ 1 1 1 1 1 1 a 2 b 2 ˆ c 1 a 2 b 2 ˆ c 2 r 2 ˆ ˆ r 2 ˆ ˆ 1 1 2 1 1 2 ˆ ˆ a 3 b 3 c 2 a 3 b 3 c 3 ˆ ˆ ˆ ˆ r 3 r 3 1 1 4 1 1 4 ˆ ˆ a 4 b 4 c 1 a 4 b 4 c 4 r 4 ˆ ˆ r 4 ˆ ˆ 1 1 3 1 1 3 ˆ ˆ a 1 b 1 c 2 a 5 b 5 c 5 r 5 ˆ ˆ r 5 ˆ ˆ 2 2 3 2 2 3 a 1 b 2 ˆ c 1 a 6 b 6 ˆ c 6 r 6 ˆ ˆ r 6 ˆ ˆ 2 2 4 2 2 4 (c) ˆ (d) ˆ D 2 : probabilistic encryption D 3 : probabilistic encryption on A, B, C individually on (A, B, C) Original FD A → B destroyed False positive FD A → C introduced 6 / 47
Challenges The FD-preserving property introduces new inference attack [PR12]. ( D 0 , FD 0 ) , ( D 1 , FD 1 ) $ D b s.t. b ← − { 0 , 1 } FD-preserving CPA-secure cipher ˆ D b � if FD 0 holds on ˆ 0 D b b ′ = 1 otherwise 7 / 47
Our Contributions Security Definition • α − security against FA -attack • Indistinguishability against FD-preserving chosen plaintext attack (IND-FCPA) Encryption Scheme We design F 2 , a frequency-hiding, FD-preserving encryption scheme based on probabilistic encryption. 8 / 47
Outline 1 Introduction 2 Related Work 3 Security Model 4 Encryption Scheme • Step 1: Identifying Maximum Attribute Sets • Step 2: Splitting-and-Scaling Encryption • Step 3: Conflict Resolution • Step 4. Eliminating False Positive FDs 5 Experiments 6 Conclusion 9 / 47
Related Work Privacy-preserving outsourced computing • Data encoding [H + 02a, H + 02b] • Data encryption [S + 00, P + 12] • Property-preserving encryption [Ker15, B + 11, G + 06, B + 09] Inference attack • FA attack [N + 15] • Query-recovery attack [I + 12] FD applications • Data cleaning [T + 11] • Schema design [BFFR05, B + 07] 10 / 47
Security Model Experiment Exp F A Π () p ′ ← A freq E ( e ) ,freq ( P ) Return 1 if p ′ = Decrypt ( k, e ) Return 0 o th erwise Adv FA Π ( A ) = Prob ( Exp FA Π ( A ) = 1 ) measures the success rate of FA attack. Definition ( α -security against FA Attack) An encryption scheme Π is α -secure against FA if for every adversary A it holds that Adv FA Π ( A ) ≤ α , where α ∈ ( 0 , 1 ] is user specified. 11 / 47
Security Model The server may exploit the FDs to break the cipher. Experiment Exp F CP A () Π ( D 0 , FD ) , ( D 1 , FD ), | D 0 | = | D 1 | $ ← − { 0 , 1 } D b s.t. b An encryption scheme Π ˆ D b b ′ b = b ′ otherwise 1 0 12 / 47
Security Model Adv FCPA ( A ) = Prob ( Exp FCPA ( A ) = 1 ) − 1 / 2 measures the Π Π advantage of the FCPA -attack over a random guess. Definition (Indistinguishability against FD- preserving Chosen Plaintext Attack (IND-FCPA)) An encryption scheme Π is IND-FCPA if for any polynomial-time adversary A , it holds that the advantage is negligible in λ , i.e., Adv FCPA ( A ) = negl ( λ ) , where λ is a Π pre-defined security parameter. 13 / 47
F 2 Encryption Scheme - Overview F 2 , a frequency-hiding FD-preserving encryption scheme, consists of four steps. D Step 1. Identifying Maximal Attribute Sets 14 / 47
F 2 Encryption Scheme - Overview F 2 , a frequency-hiding FD-preserving encryption scheme, consists of four steps. D Step 1. Identifying Maximal Attribute Sets Step 2. Splitting-and- Scaling Encryption 15 / 47
F 2 Encryption Scheme - Overview F 2 , a frequency-hiding FD-preserving encryption scheme, consists of four steps. D Step 1. Identifying Maximal Attribute Sets Step 2. Splitting-and- Scaling Encryption 16 / 47
F 2 Encryption Scheme - Overview F 2 , a frequency-hiding FD-preserving encryption scheme, consists of four steps. D Step 1. Identifying Maximal Attribute Sets Step 2. Splitting-and- Scaling Encryption 17 / 47
F 2 Encryption Scheme - Overview F 2 , a frequency-hiding FD-preserving encryption scheme, consists of four steps. D Step 1. Identifying Maximal Attribute Sets Step 2. Splitting-and- Scaling Encryption Step 3. Conflict Resolution ¯ D 18 / 47
F 2 Encryption Scheme - Overview F 2 , a frequency-hiding FD-preserving encryption scheme, consists of four steps. D Step 1. Identifying Maximal Attribute Sets Step 2. Splitting-and- Scaling Encryption Step 3. Conflict Resolution ¯ D ˆ D Step 4. Eliminating False ∆ D Positive FDs 19 / 47
Step 1 - Identifying Maximal Attribute Sets Theorem Given a dataset D and a FD X → Y , if we apply probabilistic encryption scheme on attribute set A and get ˆ D , then ˆ D preserves X → Y if ( X ∪ Y ) ⊆ A . 20 / 47
Step 1 - Identifying Maximal Attribute Sets Definition (Maximum Attribute Set ( MAS )) Given a dataset D , an attribute set A is a MAS if: (1) there exists at least an instance of A whose number of occurrences is larger than 1; and (2) no superset of A satisfies this requirement. 21 / 47
Step 1 - Identifying Maximal Attribute Sets Lemma Given a dataset D and a FD X → Y , there must exist at least a MAS M such that ( X ∪ Y ) ⊆ M . 22 / 47
Step 1 - Identifying Maximal Attribute Sets • To preserve FD s, we need to find the MAS s from the dataset. • We adapt an efficient solution named Ducc [H + 13]. • The complexity is much lower than FD discovery. ID A B C r 1 a 2 b 1 c 1 r 2 a 1 b 1 c 1 r 3 a 1 b 1 c 2 r 4 a 3 b 1 c 2 r 5 a 4 b 2 c 2 r 6 a 5 b 2 c 3 FD : A → B 23 / 47
Step 1 - Identifying Maximal Attribute Sets • To preserve FD s, we need to find the MAS s from the dataset. • We adapt an efficient solution named Ducc [H + 13]. • The complexity is much lower than FD discovery. ID A B C r 1 a 2 b 1 c 1 r 2 a 1 b 1 c 1 r 3 a 1 b 1 c 2 r 4 a 3 b 1 c 2 r 5 a 4 b 2 c 2 r 6 a 5 b 2 c 3 FD : A → B MAS = { AB, BC } 24 / 47
Step 1 - Identifying Maximal Attribute Sets • To preserve FD s, we need to find the MAS s from the dataset. • We adapt an efficient solution named Ducc [H + 13]. • The complexity is much lower than FD discovery. ID A B C r 1 a 2 b 1 c 1 r 2 a 1 b 1 c 1 r 3 a 1 b 1 c 2 r 4 a 3 b 1 c 2 r 5 a 4 b 2 c 2 r 6 a 5 b 2 c 3 FD : A → B MAS = { AB, BC } 25 / 47
Step 2 - Splitting-and-Scaling Encryption for all MAS do Construct equivalence classes (ECs) end for ID B C r 1 b 1 c 1 C 1 r 2 b 1 c 1 r 3 b 1 c 2 C 2 r 4 b 1 c 2 C 3 r 5 b 2 c 2 r 6 b 2 c 3 C 4 26 / 47
Step 2 - Splitting-and-Scaling Encryption for all MAS do Construct equivalence classes (ECs) Organize EC s into collision-free groups of size at least 1 α end for α = 1 ID B C 2 r 1 b 1 c 1 C 1 r 2 b 1 c 1 E CG 1 r 3 b 1 c 2 C 2 r 4 b 1 c 2 C 3 r 5 b 2 c 2 E CG 2 r 6 b 2 c 3 C 4 27 / 47
Step 2 - Splitting-and-Scaling Encryption for all MAS do Construct equivalence classes (ECs) Organize EC s into collision-free groups of size at least 1 α Apply splitting and scaling to reach the same frequency end for Splitting Split a EC into ω copies with the same frequency. Scaling Duplicate a EC to reach frequency homogenization. ID B C split ˆ b 1 c 1 ˆ r 1 b 1 c 1 1 1 C 1 ˆ b 2 c 2 ˆ r 2 b 1 c 1 1 1 split b 3 c 1 ˆ ˆ r 3 b 1 c 2 C 2 1 2 ˆ b 4 c 2 r 4 b 1 c 2 ˆ 1 2 C 3 r 5 b 2 c 2 r 6 b 2 c 3 C 4 28 / 47
Step 2 - Splitting-and-Scaling Encryption for all MAS do Construct equivalence classes (ECs) Organize EC s into collision-free groups Apply splitting and scaling to reach the same frequency end for We design an algorithm to decide the splitting and scaling strategy to minimize the amount of duplications. ID B C split b 1 ˆ c 1 ˆ r 1 b 1 c 1 1 1 C 1 ˆ b 2 c 2 ˆ r 2 b 1 c 1 1 1 split b 3 c 1 ˆ ˆ r 3 b 1 c 2 C 2 1 2 b 4 ˆ c 2 r 4 b 1 c 2 ˆ 1 2 C 3 r 5 b 2 c 2 r 6 b 2 c 3 C 4 29 / 47
Recommend
More recommend