Secure Linear Regression on Secure Linear Regression on Vertically Partitioned Datasets Vertically Partitioned Datasets Adria Gascon Phillipp Schoppmann Borja Balle Mariana Raykova Samee Zahur Jack Doerner David Evans Cryptography in the RAM 6/18/16 1 Computation Model
Predictive Model Predictive Model Patient Blood Count Heart Conditions Digestive Track Medicine … Effectiveness Arrhyt Inflamm Dyspha … … … RBC WBC Murmur hmia ation gia A 3.9 10.0 0 0 0 1 1 B 5.0 4.5 1 0 1 2 1.5 C 2.5 11 0 1 1 0 2 D 4.3 5.3 2 1 0 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Given samples (x 1 , y 1 ), (x 2 , y 2 ), …, (x n , y n ) • o x i ∈ℝ d , y i ∈ℝ Learn a function f such that f(x i ) = y i • Cryptography in the RAM 6/18/16 2 Computation Model
Linear Regression Linear Regression Patient Blood Count Heart Conditions Digestive Track Medicine … Effectiveness Arrhyt Inflamm Dyspha … … … RBC WBC Murmur hmia ation gia A 3.9 10.0 0 0 0 1 1 B 5.0 4.5 1 0 1 2 1.5 C 2.5 11 0 1 1 0 2 D 4.3 5.3 2 1 0 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Given samples (x 1 , y 1 ), (x 2 , y 2 ), …, (x n , y n ) • f is well approximated o x i ∈ℝ d , y i ∈ℝ by a linear map Learn a function f such that f(x i ) = y i • y i ≈ 𝜄 T x i Cryptography in the RAM 6/18/16 3 Computation Model
Secure Computation Secure Computation Patient Blood Count Heart Conditions Digestive Track Medicine … Effectiveness Arrhyt Inflamm Dyspha … … … RBC WBC Murmur hmia ation gia A 3.9 10.0 0 0 0 1 1 B 5.0 4.5 1 0 1 2 1.5 C 2.5 11 0 1 1 0 2 D 4.3 5.3 2 1 0 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shared database - (x 1 , y 1 ), (x 2 , y 2 ), …, (x n , y n ) do not belong to • the same party Compute 𝜄 securely (y i ≈ 𝜄 T x i ) • Cryptography in the RAM 6/18/16 4 Computation Model
Horizontally Partitioned Horizontally Partitioned Database Database Patient Blood Count Heart Conditions Digestive Track Medicine … Effectiveness Arrhyt Inflamm Dyspha … … … RBC WBC Murmur hmia ation gia A 3.9 10.0 0 0 0 1 1 B 5.0 4.5 1 0 1 2 1.5 C 2.5 11 0 1 1 0 2 D 4.3 5.3 2 1 0 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Different rows belong to different parties • o E.g., each patient has their own information Cryptography in the RAM 6/18/16 5 Computation Model
Vertically Partitioned Vertically Partitioned Database Database Patient Blood Count Heart Conditions Digestive Track Medicine … Effectiveness Arrhyt Inflamm Dyspha … … … RBC WBC Murmur hmia ation gia A 3.9 10.0 0 0 0 1 1 B 5.0 4.5 1 0 1 2 1.5 C 2.5 11 0 1 1 0 2 D 4.3 5.3 2 1 0 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Different columns belong to different parties • o E.g., different specialized hospitals have different parts of the information for all patients Cryptography in the RAM 6/18/16 6 Computation Model
Ridge Regression Ridge Regression Computing linear model on inputs (x 1 , y 1 ),…, (x n , y n ) • o x i ∈ℝ d , y i ∈ℝ Optimization formulation • Linear System Formulation • Cryptography in the RAM 6/18/16 7 Computation Model
Contributions Contributions Secure computation for ridge regression for vertically • partitioned database o Two phase protocol: $ % X T X + 𝛍𝐽 𝑐 = X T Y • Phase1 – compute 𝐵 = o Output is additively shared between two parties • Phase2 – solve 𝐵𝜄 = 𝑐 where A and b are shared between two parties Two party and multiparty protocol for Phase1 • o Two party inner product computation Three algorithms for Phase2: • o Cholesky, LDLT, Conjugate Gradient Descent (CGD) Implementation and evaluation • Cryptography in the RAM 6/18/16 8 Computation Model
Phase 1 Phase 1 $ % X T X + 𝛍𝐽 𝑐 = X T Y Compute 𝐵 = • The output is additively shared between two parties • Each entry of A is a dot product of the vectors held by two • different parties In the multi-party case too • Two party computation of dot product • Cryptography in the RAM 6/18/16 9 Computation Model
Phase 1 Phase 1 Architecture – inspired by [NWIJBT13] • Two additional semi-honest, non-colluding parties: • Crypto Service Provider (CSP) – generates parameters o Evaluator – helps for the evaluation of the protocols, has no inputs o Our setting • Many Parties Two Parties Cryptography in the RAM 6/18/16 10 Computation Model
Phase 1 Phase 1 Two Parties Many Parties Garb Circuit x , r y , z = 𝒚, 𝒛 - r OT OT a b b’ = b - y a’ = a + x , a’’ = 𝒃, 𝒄′ - r - r A Dot product protocol r A r B = 𝒃′, 𝒛 + a’’- z Garb. Garb. labels labels Cryptography in the RAM 6/18/16 11 Computation Model
Phase 2 Phase 2 Two party protocol • o Inputs: additive shares of matrix A and vector b o Outputs: additive shares of 𝜾 such that 𝑩𝜾 = 𝒄 Gabled circuits computation • Solutions algorithms • o Two exact algorithms: Cholesky, LDLT o One approximation algorithm: Conjugate Gradient Descent (CGD) [NWIJBT13] implements Cholesky • Cryptography in the RAM 6/18/16 12 Computation Model
Cholesky Cholesky Cholesky decomposition for • positive definite matrices o A = LL T o L: d × d lower triangular matrix Idea: solve LL T 𝜾 = 𝒄 • o L 𝜾′ = 𝒄 o L T 𝜾 = 𝜾′ Complexity: O(d 3 ) floating • forward substitution point operations Two properties: • o Data-agnostic – no pivoting backward substitution o Numerically robust – suitable for finite precision implementations Cryptography in the RAM 6/18/16 13 Computation Model
LDLT LDLT Variant of Cholesky • decomposition o A = LDL T o L – lower triangular o D – diagonal, non-negative entries Idea: solve LDL T 𝜾 = 𝒄 • o L 𝜾” = 𝒄 o D 𝜾′ = 𝜾” o L T 𝜾 = 𝜾′ Complexity: O(d 3 ) • o No square root o Additional substitution phase Same properties • Cryptography in the RAM 6/18/16 14 Computation Model
CGD CGD Approximate solution • Solving 𝐵𝜄 = 𝑐 by solving • the optimization 𝐛𝐬𝐡𝐧𝐣𝐨 𝜾 ||𝑩𝜾 − 𝒄|| 𝟑 Iterative solutions • approach based on conjugate gradients Complexity • o Until convergence O(d 3 ) o Early termination O(d 2 ) per iteration Error: ε after 𝑷( 𝝺 𝐦𝐩𝐡 1/ ε ) • iterations o 𝞴 - condition number Cryptography in the RAM 6/18/16 15 Computation Model
Fixed-Point Arithmetic Fixed-Point Arithmetic ϕ q φ δ R � Z � Z q ϕ q ˜ ˜ φ δ J F 𝑨 = 𝑨𝜀 , |𝑠 − 𝜚 J F 𝜚 F 𝑠 𝜚 F 𝑠 = [𝑠/𝜀 ]; 𝜚 | ≤ 𝜀 • 𝜒 𝑨 = 𝑨 if z ≥ 0 ; 𝜒 𝑨 = 𝑨 + 𝑟 if z < 0 • T 𝑣 = 𝑣 if 0 ≤ u ≤ q/2 ; 𝜒 T 𝑣 = 𝑣 − 𝑟 if 𝑟/2 < u ≤ q − 1 • 𝜒 Phase1: n-dim vectors with entries of size R • o Error: n(2R 𝜀 + 𝜀 2 ) o Normalize R ≤ 𝟐/ 𝒐 ⇒ error ε with 𝜀 = ε / 2 𝑜 and q = 8n/ ε 2 O(log(n/ ε )) bit representation • Phase2 – experiments • o q = 2 32 (4 bits integer part, 1 bit sign) ⇒ 𝜀 = 2 -27 o q = 2 64 (4 bits integer part, 1 bit sign) ⇒ 𝜀 = 2 -59 Cryptography in the RAM 6/18/16 16 Computation Model
Implementation and Implementation and Evaluation Evaluation Obliv-C • o Most recent optimizations: Free XOR, Garbled Row Reduction, Fixed Key Block Ciphers, Half Gates Fixed point arithmetic on top of Obliv-C • o Algorithms: multiplication (Karatsuba-Comba), division (Knuth’s algorithm D), square root(Newton’s method) o 32 bits: 4 bits (integral part) + 28 bit (fractional part) Synthetic datasets (vs real datasets) • o Generated with correct 𝛍 parameter – sample from d- dimensional Gaussian distribution o Tuning 𝛍 privately is hard question – incorrect 𝛍 makes the optimization too easy or too difficult Amazon EC2 C4 (15GB RAM, 8 CPU cores) • Cryptography in the RAM 6/18/16 17 Computation Model
Phase 1 Phase 1 1 . 2 Database partitioned Trusted Initializer equally among parties Parties (average) 1 . 0 Normalized computation time 0 . 8 0 . 6 0 . 4 ( n , d) column1 ( 2000, 20) 0 . 2 column2 (10000,100) column3 (50000,500) 0 . 0 2 3 4 Number of parties Number of parties d 2 3 4 20 0.17 0.033 0.22 0.032 0.26 0.030 Cryptography in the RAM 6/18/16 18 100 19 1.7 26 1.6 29 1.4 Computation Model 500 109 146 149 125 166 104
Phase 2 Phase 2 10 11 10 10 10 9 circuit size 10 8 CGD 1 CGD 10 10 7 CGD 15 Cholesky 10 6 10 1 10 2 size d Cryptography in the RAM 6/18/16 19 Computation Model
Phase 2 Phase 2 Convergence of CGD Fixed vs Floating Point Cryptography in the RAM 6/18/16 20 Computation Model
Recommend
More recommend