Full statistical analyses with secure multi-party computation Dan Bogdanov, Liina Kamm, Ville Sokk dan@cyber.ee http://sharemind.cyber.ee/
The Sharemind model Input Computing Result parties parties parties x 11 CP 1 y 1 IP 1 ... RP 1 x 1 y x k1 ... x 12 ... CP 2 ... y 2 x k2 x 13 IP k RP l x k CP 3 y ... y 3 x k3 Step 1: Step 2: Step 3: secret sharing secure multi-party reconstruction and storage of inputs computation of results
Secret sharing (simplified) 75 53 38 84 75 - 53 - 38 = 84 mod 100 Reconstruction: 53 + 38 + 84 = 75 mod 100
MPC from secret sharing P 1 P 2 P 3 x 3 x 1 x 2 Inputs Computation (y 1 , y 2 , y 3 ) = f(x 1 , x 2 , x 3 ) Outputs y 1 y 2 y 3 All operations are composable.
Strengths / weaknesses • Easy to write code for. • Requires three servers Developers apply for best possible privacy patterns on efficiency (works with classical algorithms. 2 to n servers as well). • Hybrid execution • Performance profile model for balancing not immediately public and private intuitive. computations. • Custom protocols may • Very high performance perform better in some for arithmetic circuits. cases. • Small storage overhead (3 times for 3 servers).
Genome data and MPC A Secure genome-wide association study workflow Genotype & Securely stored Case & control Results of phenotype genotype & phenotype group index the study Data Secure coding Case & control Secure statistical acquisition and storage determination testing SNP p<0.1 C B Data acquisition and secure storage Determining cases and controls Scenario 1: secure 23andMe Scenario 1: Extended clinical study genotype case/control (GATGAG…) index vector Wetlab phenotype (based on available Available (age, diseases, ...) phenotypes) Research Secure storage phenotype Survey institution and processing information Scenario 2: international consortium study Scenario 2: Phenotype-based filtering filtering query genotype/phenotype on securely stored phenotypes Gene bank 1 (donors D11,…, D1m) ... genotype/phenotype securely computed Research Secure storage Secure storage case/control (donors Dn1, …, Dnm) institution and processing Gene bank n and processing index vector
Application development Description of the data analysis task Business logic UX requirements Data model SecreC Controller language library Application Server package End user applications end users secure (data owners, application analysts etc) servers
Our competition entry • Task 2.1 • Importer (C++/SecreC), ~200 lines of code • Analyzer (C++/SecreC), ~200 lines of code • Secure operations used: secure integer arithmetic, floating point arithmetic, including division. • Task 2.2 • Importer (C++/SecreC), ~200 lines of code • Analyzer (C++/SecreC), ~300 lines of code • Secure operations used: secure integer arithmetic, shuffling, AES.
The Rmind tool Rmind
The Rmind tool Rmind
Features of Rmind • Data import : CSV, anything with custom importers • Descriptive statistics: stdev, var, cov, quantiles, histogram, frequency plots, heatmap • Quality assurance: filtering, outlier removal with median absolute deviation • Transformations : Sorting, merging, aggregation • Testing : t-test, chi-square, Cochrane-Armitage, transmission disequilibrium, Wilcoxon, Mann-Whitney • Multiple testing: Bonferroni correction, Benjamini- Hochberg procedure • Regressions : linear, logistic • We are continuously implementing new functions.
Legal situation • In January 2014, the Estonian Data Protection Agency cleared the use of Sharemind/Rmind for education records of Estonian students. • In January 2015, the Estonian Tax and Customs Board cleared the use of Sharemind/Rmind for analyzing tax records of working students. • We also have experience in forming contracts with all associated parties under European law. • The EU PRACTICE project published a legal analysis of the technology from a European perspective. http://practice-project.eu/downloads/publications/ D31.1-Risk-assessment-legal-status-PU-M12.pdf
Literature 1. [K15] Liina Kamm. Privacy-preserving statistical analysis using secure multi- party computation . PhD thesis. University of Tartu. 2015. http://hdl.handle.net/ 10062/45343 2. [BKLS14] Dan Bogdanov, Liina Kamm, Sven Laur, Ville Sokk. Rmind: a tool for cryptographically secure statistical analysis . Cryptology ePrint Archive, Report 2014/512. 2014. http://eprint.iacr.org/2014/512.pdf 3. [KBLV13] Liina Kamm, Dan Bogdanov, Sven Laur, Jaak Vilo. A new way to protect privacy in large-scale genome-wide association studies . Bioinformatics 29 (7): 886-893, 2013. http://bioinformatics.oxfordjournals.org/content/29/7/886 4. [B13] Dan Bogdanov. Sharemind: programmable secure computations with practical applications . PhD thesis. University of Tartu. 2013. http://hdl.handle.net/ 10062/29041
Acknowledgments Our ¡entry ¡to ¡the ¡iDASH ¡Privacy ¡& ¡Security ¡Workshop ¡Secure ¡Genome ¡Analysis ¡CompePPon ¡ was ¡prepared ¡with ¡support ¡from http://practice-project.eu/ "The ¡ PRACTICE ¡project ¡has ¡received ¡funding ¡from ¡the ¡European ¡Union's ¡Seventh ¡Framework ¡ Programme ¡([FP7/2007-‑2013]) ¡under ¡grant ¡agreement ¡number ¡ICT-‑609611.” ¡ The ¡informaPon ¡in ¡this ¡document ¡is ¡provided ¡“as ¡is”, ¡and ¡no ¡guarantee ¡or ¡warranty ¡is ¡given ¡that ¡the ¡informaPon ¡is ¡fit ¡for ¡any ¡parPcular ¡ purpose. ¡The ¡user ¡thereof ¡uses ¡ ¡the ¡informaPon ¡at ¡its ¡sole ¡risk ¡and ¡liability. ¡
Recommend
More recommend