secure linear regression on secure linear regression on
play

Secure Linear Regression on Secure Linear Regression on Vertically - PowerPoint PPT Presentation

Secure Linear Regression on Secure Linear Regression on Vertically Partitioned Datasets Vertically Partitioned Datasets Adria Gascon Phillipp Schoppmann Borja Balle Mariana Raykova Samee Zahur Jack Doerner David Evans Cryptography in the


  1. Secure Linear Regression on Secure Linear Regression on Vertically Partitioned Datasets Vertically Partitioned Datasets Adria Gascon Phillipp Schoppmann Borja Balle Mariana Raykova Samee Zahur Jack Doerner David Evans Cryptography in the RAM 6/18/16 1 Computation Model

  2. Predictive Model Predictive Model Patient Blood Count Heart Conditions Digestive Track Medicine … Effectiveness Arrhyt Inflamm Dyspha … … … RBC WBC Murmur hmia ation gia A 3.9 10.0 0 0 0 1 1 B 5.0 4.5 1 0 1 2 1.5 C 2.5 11 0 1 1 0 2 D 4.3 5.3 2 1 0 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Given samples (x 1 , y 1 ), (x 2 , y 2 ), …, (x n , y n ) • o x i ∈ℝ d , y i ∈ℝ Learn a function f such that f(x i ) = y i • Cryptography in the RAM 6/18/16 2 Computation Model

  3. Linear Regression Linear Regression Patient Blood Count Heart Conditions Digestive Track Medicine … Effectiveness Arrhyt Inflamm Dyspha … … … RBC WBC Murmur hmia ation gia A 3.9 10.0 0 0 0 1 1 B 5.0 4.5 1 0 1 2 1.5 C 2.5 11 0 1 1 0 2 D 4.3 5.3 2 1 0 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Given samples (x 1 , y 1 ), (x 2 , y 2 ), …, (x n , y n ) • f is well approximated o x i ∈ℝ d , y i ∈ℝ by a linear map Learn a function f such that f(x i ) = y i • y i ≈ 𝜄 T x i Cryptography in the RAM 6/18/16 3 Computation Model

  4. Secure Computation Secure Computation Patient Blood Count Heart Conditions Digestive Track Medicine … Effectiveness Arrhyt Inflamm Dyspha … … … RBC WBC Murmur hmia ation gia A 3.9 10.0 0 0 0 1 1 B 5.0 4.5 1 0 1 2 1.5 C 2.5 11 0 1 1 0 2 D 4.3 5.3 2 1 0 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shared database - (x 1 , y 1 ), (x 2 , y 2 ), …, (x n , y n ) do not belong to • the same party Compute 𝜄 securely (y i ≈ 𝜄 T x i ) • Cryptography in the RAM 6/18/16 4 Computation Model

  5. Horizontally Partitioned Horizontally Partitioned Database Database Patient Blood Count Heart Conditions Digestive Track Medicine … Effectiveness Arrhyt Inflamm Dyspha … … … RBC WBC Murmur hmia ation gia A 3.9 10.0 0 0 0 1 1 B 5.0 4.5 1 0 1 2 1.5 C 2.5 11 0 1 1 0 2 D 4.3 5.3 2 1 0 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Different rows belong to different parties • o E.g., each patient has their own information Cryptography in the RAM 6/18/16 5 Computation Model

  6. Vertically Partitioned Vertically Partitioned Database Database Patient Blood Count Heart Conditions Digestive Track Medicine … Effectiveness Arrhyt Inflamm Dyspha … … … RBC WBC Murmur hmia ation gia A 3.9 10.0 0 0 0 1 1 B 5.0 4.5 1 0 1 2 1.5 C 2.5 11 0 1 1 0 2 D 4.3 5.3 2 1 0 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Different columns belong to different parties • o E.g., different specialized hospitals have different parts of the information for all patients Cryptography in the RAM 6/18/16 6 Computation Model

  7. Ridge Regression Ridge Regression Computing linear model on inputs (x 1 , y 1 ),…, (x n , y n ) • o x i ∈ℝ d , y i ∈ℝ Optimization formulation • Linear System Formulation • Cryptography in the RAM 6/18/16 7 Computation Model

  8. Contributions Contributions Secure computation for ridge regression for vertically • partitioned database o Two phase protocol: $ % X T X + 𝛍𝐽 𝑐 = X T Y • Phase1 – compute 𝐵 = o Output is additively shared between two parties • Phase2 – solve 𝐵𝜄 = 𝑐 where A and b are shared between two parties Two party and multiparty protocol for Phase1 • o Two party inner product computation Three algorithms for Phase2: • o Cholesky, LDLT, Conjugate Gradient Descent (CGD) Implementation and evaluation • Cryptography in the RAM 6/18/16 8 Computation Model

  9. Phase 1 Phase 1 $ % X T X + 𝛍𝐽 𝑐 = X T Y Compute 𝐵 = • The output is additively shared between two parties • Each entry of A is a dot product of the vectors held by two • different parties In the multi-party case too • Two party computation of dot product • Cryptography in the RAM 6/18/16 9 Computation Model

  10. Phase 1 Phase 1 Architecture – inspired by [NWIJBT13] • Two additional semi-honest, non-colluding parties: • Crypto Service Provider (CSP) – generates parameters o Evaluator – helps for the evaluation of the protocols, has no inputs o Our setting • Many Parties Two Parties Cryptography in the RAM 6/18/16 10 Computation Model

  11. Phase 1 Phase 1 Two Parties Many Parties Garb Circuit x , r y , z = 𝒚, 𝒛 - r OT OT a b b’ = b - y a’ = a + x , a’’ = 𝒃, 𝒄′ - r - r A Dot product protocol r A r B = 𝒃′, 𝒛 + a’’- z Garb. Garb. labels labels Cryptography in the RAM 6/18/16 11 Computation Model

  12. Phase 2 Phase 2 Two party protocol • o Inputs: additive shares of matrix A and vector b o Outputs: additive shares of 𝜾 such that 𝑩𝜾 = 𝒄 Gabled circuits computation • Solutions algorithms • o Two exact algorithms: Cholesky, LDLT o One approximation algorithm: Conjugate Gradient Descent (CGD) [NWIJBT13] implements Cholesky • Cryptography in the RAM 6/18/16 12 Computation Model

  13. Cholesky Cholesky Cholesky decomposition for • positive definite matrices o A = LL T o L: d × d lower triangular matrix Idea: solve LL T 𝜾 = 𝒄 • o L 𝜾′ = 𝒄 o L T 𝜾 = 𝜾′ Complexity: O(d 3 ) floating • forward substitution point operations Two properties: • o Data-agnostic – no pivoting backward substitution o Numerically robust – suitable for finite precision implementations Cryptography in the RAM 6/18/16 13 Computation Model

  14. LDLT LDLT Variant of Cholesky • decomposition o A = LDL T o L – lower triangular o D – diagonal, non-negative entries Idea: solve LDL T 𝜾 = 𝒄 • o L 𝜾” = 𝒄 o D 𝜾′ = 𝜾” o L T 𝜾 = 𝜾′ Complexity: O(d 3 ) • o No square root o Additional substitution phase Same properties • Cryptography in the RAM 6/18/16 14 Computation Model

  15. CGD CGD Approximate solution • Solving 𝐵𝜄 = 𝑐 by solving • the optimization 𝐛𝐬𝐡𝐧𝐣𝐨 𝜾 ||𝑩𝜾 − 𝒄|| 𝟑 Iterative solutions • approach based on conjugate gradients Complexity • o Until convergence O(d 3 ) o Early termination O(d 2 ) per iteration Error: ε after 𝑷( 𝝺 𝐦𝐩𝐡 1/ ε ) • iterations o 𝞴 - condition number Cryptography in the RAM 6/18/16 15 Computation Model

  16. Fixed-Point Arithmetic Fixed-Point Arithmetic ϕ q φ δ R � Z � Z q ϕ q ˜ ˜ φ δ J F 𝑨 = 𝑨𝜀 , |𝑠 − 𝜚 J F 𝜚 F 𝑠 𝜚 F 𝑠 = [𝑠/𝜀 ]; 𝜚 | ≤ 𝜀 • 𝜒 𝑨 = 𝑨 if z ≥ 0 ; 𝜒 𝑨 = 𝑨 + 𝑟 if z < 0 • T 𝑣 = 𝑣 if 0 ≤ u ≤ q/2 ; 𝜒 T 𝑣 = 𝑣 − 𝑟 if 𝑟/2 < u ≤ q − 1 • 𝜒 Phase1: n-dim vectors with entries of size R • o Error: n(2R 𝜀 + 𝜀 2 ) o Normalize R ≤ 𝟐/ 𝒐 ⇒ error ε with 𝜀 = ε / 2 𝑜 and q = 8n/ ε 2 O(log(n/ ε )) bit representation • Phase2 – experiments • o q = 2 32 (4 bits integer part, 1 bit sign) ⇒ 𝜀 = 2 -27 o q = 2 64 (4 bits integer part, 1 bit sign) ⇒ 𝜀 = 2 -59 Cryptography in the RAM 6/18/16 16 Computation Model

  17. Implementation and Implementation and Evaluation Evaluation Obliv-C • o Most recent optimizations: Free XOR, Garbled Row Reduction, Fixed Key Block Ciphers, Half Gates Fixed point arithmetic on top of Obliv-C • o Algorithms: multiplication (Karatsuba-Comba), division (Knuth’s algorithm D), square root(Newton’s method) o 32 bits: 4 bits (integral part) + 28 bit (fractional part) Synthetic datasets (vs real datasets) • o Generated with correct 𝛍 parameter – sample from d- dimensional Gaussian distribution o Tuning 𝛍 privately is hard question – incorrect 𝛍 makes the optimization too easy or too difficult Amazon EC2 C4 (15GB RAM, 8 CPU cores) • Cryptography in the RAM 6/18/16 17 Computation Model

  18. Phase 1 Phase 1 1 . 2 Database partitioned Trusted Initializer equally among parties Parties (average) 1 . 0 Normalized computation time 0 . 8 0 . 6 0 . 4 ( n , d) column1 ( 2000, 20) 0 . 2 column2 (10000,100) column3 (50000,500) 0 . 0 2 3 4 Number of parties Number of parties d 2 3 4 20 0.17 0.033 0.22 0.032 0.26 0.030 Cryptography in the RAM 6/18/16 18 100 19 1.7 26 1.6 29 1.4 Computation Model 500 109 146 149 125 166 104

  19. Phase 2 Phase 2 10 11 10 10 10 9 circuit size 10 8 CGD 1 CGD 10 10 7 CGD 15 Cholesky 10 6 10 1 10 2 size d Cryptography in the RAM 6/18/16 19 Computation Model

  20. Phase 2 Phase 2 Convergence of CGD Fixed vs Floating Point Cryptography in the RAM 6/18/16 20 Computation Model

Recommend


More recommend