Privacy-preserving statistical analysis Liina Kamm liina@cyber.ee http://sharemind.cyber.ee/
Rmind: a tool for cryptographically secure statistical analysis Dan Bogdanov, Liina Kamm, Sven Laur and Ville Sokk ePrint report 2014/512
Sharemind Input Computing Result parties parties parties x 11 CP 1 y 1 IP 1 ... RP 1 x 1 y x k1 ... x 12 ... CP 2 ... y 2 x k2 x 13 IP k RP l x k y CP 3 ... y 3 x k3 Step 1: Step 2: Step 3: secret sharing secure multiparty reconstruction of inputs computation of results
Necessary functionality • Classification, declassification and publishing of values • Protected storage of a private value • Support for vectors and matrices • Integer, Boolean, floating-point arithmetic • Division, square root • Shuffling • Linking • Sorting
Filtering Database Filtered attribute in the A A A A usual setting t t t t t t t r t r r r i i i b b i b b u u u u t Attribute j t t e t e e e ... m ... 1 2 k elements j D1 D2 Filtered attribute in the privacy- n elements ... ... preserving setting ... Di Attribute j ... ... ... Mask vector 1 0 1 0 n elements Dn
Quantiles (1) Q ( p, [ [ ~ a ] ]) = (1 − � ) · [ [ a j ] ] + � · [ [ a j +1 ] ] where j = b ( n � 1) p c + 1, finding the j -th elem d � = np � b ( n � 1) p c � p . alues, we can either use
Quantiles (2) Algorithm 2: Privacy-preserving algorithm for finding the five-number summary of a vector that leaks the size of the selected subset Data : Input data vector [ [ ~ a ] ] and corresponding mask vector [ [ ~ m ] ]. Result : Minimum [ [ min ] ], lower quartile [ [ lq ] ], median [ [ me ] ], upper quartile [ [ uq ] ], and maximum [ [ max ] ] of [ [ ~ a ] ] based on the mask vector [ [ ~ m ] ] 1 [ [ ~ x ] ] cut ([ [ ~ a ] ] , [ [ ~ m ] ]) [ ~ 2 [ b ] ] sort ([ [ ~ x ] ]) 3 [ [ min ] ] [ [ b 1 ] ] 4 [ [ max ] ] [ [ b n ] ] [ ~ 5 [ [ lq ] ] Q (0 . 25 , [ b ] ]) [ ~ 6 [ [ me ] ] Q (0 . 5 , [ b ] ]) [ ~ 7 [ [ uq ] ] Q (0 . 75 , [ b ] ]) 8 return ([ [ min ] ] , [ [ lq ] ] , [ [ me ] ] , [ [ uq ] ] , [ [ max ] ])
Quantiles (3) Algorithm 3: Privacy-preserving algorithm for finding the five-number summary of a vector that hides the size of the selected subset. Data : Input data vector [ [ ~ a ] ] of size N and corresponding mask vector [ [ ~ m ] ]. Result : Minimum [ [ min ] ], lower quartile [ [ lq ] ], median [ [ me ] ], upper quartile [ [ uq ] ], and maximum [ [ max ] ] of [ [ ~ a ] ] based on the mask vector [ [ ~ m ] ] [ ~ [ ~ 1 ([ b ] ] , [ m 0 ] ]) sort ⇤ ([ [ ~ a ] ] , [ [ ~ m ] ]) 2 [ [ n ] ] sum ([ [ ~ m ] ]) 3 [ [ os ] ] N � [ [ n ] ] 4 [ [ min ] ] [ [ b [ ] ] ] [1+ os ] 5 [ [ max ] ] [ [ b N ] ] ] Q ⇤ (0 . 25 , [ 6 [ [ lq ] [ ~ a ] ] , [ [ os ] ]) ] Q ⇤ (0 . 5 , [ 7 [ [ me ] [ ~ a ] ] , [ [ os ] ]) ] Q ⇤ (0 . 75 , [ 8 [ [ uq ] [ ~ a ] ] , [ [ os ] ]) 9 return ([ [ min ] ] , [ [ lq ] ] , [ [ me ] ] , [ [ uq ] ] , [ [ max ] ])
Descriptive statistics • Five number summary and boxplot • Histogram, frequency table, heatmap • Mean, variance, standard deviation, covariance
Statistical testing Public data Test Data p-value Comparison Threshold statistic Option 1 Private data Public data Test Data p-value Comparison Threshold statistic Option 2 Private data Public data Test Critical test Data Comparison Threshold statistic statistic Option 3 Private data Public data Test Data p-value Comparison Threshold statistic
Statistical tests • t-test, paired t-test • Wilcoxon rank sum test, signed rank test • chi-square test • Multiple testing correction • Bonferroni correction • Benjamini-Hochberg procedure
Linear regression (1) • k independent variables , one dependent variable x k y • Want to find such that b i y j = � k X j,k + . . . + � 1 X j, 1 + � 0 X j, 0 + " j " = X ~ as ~ � � ~ y . " k 2 = k ~ y � X ~ � k 2 • Minimise the square of residuals k ~ • Convert the task to its equivalent characterisation in terms of linear equations X T X ~ � = X T ~ y .
Linear regression (2) • Simple linear regression (one variable) • Matrix inversion (up to four variables) • Gaussian elimination with back substitution • LU decomposition • Conjugate gradient method
Max Loc Algorithm 10: maxLoc : Finding the first maximum element and its lo- cation in a vector in a privacy-preserving setting Data : A vector [ [ ~ a ] ] of length n Result : The maximum element [ [ b ] ] and its location [ [ l ] ] in the vector 1 Let ⇡ ( j ) be a permutation of indices j ∈ { 1 , . . . , n } 2 [ [ b ] ] ← [ [ a π (1) ] ] and [ [ l ] ] ← ⇡ (1) 3 for i ∈ { ⇡ (2) , . . . , ⇡ ( n ) } do � > | [ � � [ [ c ] ] ← ( � [ [ a π ( i ) ] ] [ b ] ] | ) 4 [ [ b ] ] ← [ [ b ] ] − [ [ c ] ] · [ [ b ] ] + [ [ c ] ] · [ [ a π ( i ) ] ] 5 [ [ l ] ] ← [ [ l ] ] − [ [ c ] ] · [ [ l ] ] + [ [ c ] ] · ⇡ ( i ) 6 7 end 8 return ([ [ b ] ] , [ [ l ] ])
Rmind demo
https://sharemind.cyber.ee/ sharemind@cyber.ee
Recommend
More recommend