Optimizing MPC for robust and scalable integer and floating-point arithmetic Liisi Kerik * Peeter Laud * Jaak Randmets * † * Cybernetica AS † University of Tartu, Institute of Computer Science January 30, 2016
Introduction • Secure multiparty computation (SMC) • Examples: Yao, Income study • Most applications have been run on small data volumes. • Only one deployment processing tens of millions of education and income records. • Performance is a major hurdle. • In this talk will show that SMC can be scalable and robust. 1/15
Overview of the talk • Background • Improvements in floating-point protocols • Generic optimization techniques • Performance results 2/15
Secret sharing • We mostly use additive 3-party secret-sharing: v = ( v 1 + v 2 + v 3 ) mod N . • Private values are denoted with � v � . • Integer addition � w � = � u � + � v � is local: � w � i = � u � i + � v � i mod N . • We build integer and floating-point arithmetic on top of this representation. 3/15
Representing floating-point numbers x = ( − 1) s · f · 2 e • Sign bit s is 0 for positive and 1 for negative numbers. • Significand f ∈ [0 . 5 , 1) is represented as a fixed-point number with 0 bits before radix point. • e is the exponent (with range identical to that of the IEEE float). 4/15
Primitive protocols • Extend( � u � , n ) casts � u � ∈ Z 2 m to equal value in Z 2 n + m . • Cut( � u � , n ) drops n least-significant bits of � u � ∈ Z 2 m . • can be used to implement division by power-of-two • MultArr( � u � , { � v i � } k i =1 ) multiplies point-wise. • more efficient than multiplying � u � with every � v i � 5/15
Polynomial evaluation • Floating-point functions we approximate with polynomials: sqrt, sin, exp, ln, erf. • Polynomial evaluation requires additions. Floating-point additions are expensive due to private shifts. Fixed-point polynomials can be computed much faster. • We have improved fixed-point polynomial evaluation. • Efficiency improvements for polynomial of degree 16 on a 64 -bit fixed-point number: • old: 89 rounds, 27 KB of communication. • new: 57 rounds, 7 . 5 KB of communication. 6/15
Improvements in precision Relative errors of inverse and square root Old New 1 . 3 · 10 − 4 2 . 69 · 10 − 9 inv32 1 . 3 · 10 − 8 7 . 10 · 10 − 19 inv64 5 . 1 · 10 − 6 4 . 92 · 10 − 9 sqrt32 4 . 1 · 10 − 11 1 . 30 · 10 − 15 sqrt64 7/15
Hacks for faster polynomial evaluation • Restrict domain and range to [0 , 1) . (Coefficients can still be of any size.) • If we know the argument is in range [2 − n k, 2 − n ( k + 1)) , then instead of interpolating f ( x ) in range [2 − n k, 2 − n ( k + 1)) we interpolate f (2 − n ( x + k )) in range [0 , 1) . Smaller coefficients and better precision. • We add a small linear term to the function we interpolate. Gets rid of denormalized results and overflows. • Instead of using ordinary fixed-point multiplications (extend, multiply, cut), we extend the argument sufficiently in the beginning and later only perform multiplications and cuts. • In the end, instead of cutting the excess bits and adding the terms, we add the terms and then cut. 8/15
Powers of a fixed-point number Data : � � x � ( 0 bits before, n bits after radix point) i =1 ( n ′ + n bits before, n bits after radix point) Result : { � � x i � } k 1 if k = 0 then return {} 2 3 else l ← ⌈ log 2 k ⌉ 4 x � , n ′ + ( l + 1) n ) � � x 1 � ← Extend( � � 5 for i ← 0 to l − 1 do 6 j =2 i +1 ← MultArr( � � { � � x 2 i � , { � � x j � } 2 i +1 x j � } 2 i j =1 ) 7 for j ← 1 to 2 i +1 do in parallel 8 � � x j � ← Cut( � � x j � , n ) 9 return { � � x i � } k 10 i =1 9/15
Fixed-point polynomial evaluation c i } k Data : � � x � ( 0 bits before, n bits after radix point), { � i =0 ( n ′ + n bits before, n bits after radix point, highest n bits empty) c i · � � x i � } k Result : Sum( { � i =0 ) ( 0 bits before, n bits after radix point) 1 { � � x i � } k x � , k, n, n ′ ) i =1 ← PowArr( � � 2 � � z 0 � ← Share( � c 0 ) 3 for i ← 1 to k do in parallel c i · � � � � z i � ← � x i � 4 5 for i ← 0 to k do in parallel � � z ′ z i � , n ′ ) i � ← Trunc( � � 6 7 return Cut(Sum( { � � i � } k z ′ i =0 ) , n ) 10/15
New floating-point protocols: sine Sine • Reduce to range ( − 2 π, 2 π ) . • sin ( − x ) = − sin x , sin ( x + π ) = − sin x , sin ( π/ 2 − x ) = sin ( π/ 2 + x ) . • Polynomial approximation. • Near zero we use sin x ≈ x for better precision. 11/15
New floating-point protocols: logarithm Logarithm • log 2 (2 e · f ) = e + log 2 f . • e + log 2 f = ( e − 2) + 2(log 4 f + 1) . f ∈ [0 . 5 , 1) ⇒ log 4 f + 1 ∈ [0 . 5 , 1) . • Polynomial approximation. (For double precision, two different polynomials.) • The end result is computed through floating-point addition. • Near 1 we use second degree Taylor polynomial. • Conversion ln x = ln 2 · log 2 x . 12/15
Generic optimization techniques
Resharing protocol Algorithm 1: Resharing protocol. Data : Shared values � u � ∈ R Result : Shared value � w � ∈ R such that u = w . 1 All parties P i perform the following: r ← R 2 Send r to P p ( i ) 3 Receive r ′ from P n ( i ) 4 � w � i ← � u � i + ( r − r ′ ) 5 6 return � w � • resharing is used to ensure messages are independent of inputs and outputs • All protocols and sub-protocols reshare their inputs. 14/15
Shared random number generators • A common pattern: generate a random number and send it to some other party. • We can instead use a common random number generator. • We automatically perform this optimization (mostly). • Performance improvements: • reduced network communication by 30% to 60% • improved runtime performance by up to 60% • Automatic optimization. 15/15
Multiplication protocol Algorithm 2: Multiplication protocol. Data : Shared values � u � , � v � ∈ R Result : Shared value � w � ∈ R such that u · v = w . 1 � u � ← Reshare( � u � ) 2 � v � ← Reshare( � v � ) 3 All parties P i perform the following: Send � u � i and � v � i to P n ( i ) 4 Receive � u � p ( i ) and � v � p ( i ) from P p ( i ) 5 � w � i ← � u � i · � v � i + � u � p ( i ) · � v � i + � u � i · � v � p ( i ) 6 7 � w � ← Reshare( � w � ) 8 return � w � 16/15
Multiplication protocol × 3 2 × 2 × 3 × 2 1 × 2 3 × 3 17/15
Multiplication protocol × 3 2 × 2 × 3 × 2 1 × 2 3 × 3 17/15
Communication symmetric multiplication Algorithm 3: Symmetric multiplication protocol. Data : Shared values � u � , � v � ∈ R Result : Shared value � w � ∈ R such that u · v = w . 1 � u � ← Reshare( � u � ) 2 � v � ← Reshare( � v � ) 3 All parties P i perform the following: Send � u � i to P n ( i ) and � v � i to P p ( i ) 4 Receive � u � p ( i ) from P p ( i ) and � v � n ( i ) from P n ( i ) 5 � w � i ← � u � i · � v � i + � u � p ( i ) · � v � i + � u � p ( i ) · � v � n ( i ) 6 7 � w � ← Reshare( � w � ) 8 return � w � 18/15
Balanced communication 2 1 3 19/15
Conclusions • Performance evaluation on up to 10 9 element vectors and up to 1000 repeats. • Demonstrates scalability and robustness. • Memory limitations at 10 10 . Results • Can perform 22 million 32 -bit integer multiplication per second. Previous published best was 8 million. • Late generation Intel i486 (1992). • Up to 230 kFLOPS – Intel 80387 (1987). 20/15
Recommend
More recommend