secure data sharing and distribution platform for
play

Secure Data Sharing and Distribution Platform for Integrated Big - PowerPoint PPT Presentation

Secure Data Sharing and Distribution Platform for Integrated Big Data Utilization Oct.2015-Mar.2021 funded by Japan Science and Technology Agency Secure Data Sharing and Distribution Platform for Integrated Big Data Utilization - Handling all


  1. Secure Data Sharing and Distribution Platform for Integrated Big Data Utilization Oct.2015-Mar.2021 funded by Japan Science and Technology Agency Secure Data Sharing and Distribution Platform for Integrated Big Data Utilization - Handling all data with encryption - Group Members Waseda University Hayato YAMANA Institute of Information Security AtsuhiroGOTO Ochanomizu University Masato OGUCHI Kogakuin University Saneyasu YAMAGUCHI The University of Electro-Communications Takahiko SHINTANI Meiji Pharmaceutical University Tamotsu NOGUCHI 1

  2. SD 2 Platform for Integrated Big Data Utilization Brief Introduction of our Project 1. Research Background 2. Objective 3. Research Goal 4. Research Strategy 5. Experiment 6. Schedule 7. Progress in 2015FY 2

  3. SD 2 Platform for 1. Research Background Integrated Big Data Utilization At least 40% of it requires some level of security , from privacy protection to full-encryption ‘lockdown.’ … Also unfortunately, the amount needing protection will grow … e.g. How should we manage private genome data? (*) http://www.emc.com/leadership/digital-universe/2014iview/ 3

  4. SD 2 Platform for 1. Research Background Integrated Big Data Utilization Anonymization n Attribute Linkage Model n Limitation of k-anonymity, l-diversity, t-closeness n Anonymization Probabilistic Model n differential privacy n Link Attack William was governor of Massachusetts and his medical records were in the GIC data. Governor Weld lived in Cambridge. According to the Cambridge Voter list, six people had his particular birth date; only three of them were men; and, he was the only one in his 5- GIC data digit ZIP code. 4

  5. SD 2 Platform for Integrated Big Data Utilization 2. OBJECTIVE OUR APPROACH IS NOT ANONYMIZATION OUR APPROACH IS HANDLING ALL DATA WITH ENCRYPTION THROUGHOUT DATA LIFE CYCLE YOU CAN ADOPT ANONYMIZATION, BESIDES. 5

  6. SD 2 Platform for Integrated Big Data Utilization 3. Research Goal HANDLING ALL DATA WITH ENCRYPTION THROUGHOUTDATA LIFE CYCLE cloud servers A Analysis on cloud servers B storage storage storage d1 knowledge d2 knowledge d3 User 1 dn Raw data User 2 Protecting Data from Leaking User i 6

  7. SD 2 Platform for Integrated Big Data Utilization 3. Research Goal HANDLING ALL DATA WITH ENCRYPTION THROUGHOUTDATA LIFE CYCLE cloud servers A Analysis 1.Confidentiality guarantee for kinds of contents on cloud servers B using Fully Homomorphic Encryption(FHE) storage storage storage 2. Assurance of content source and provenance d1 knowledge using Proof of storage d2 knowledge d3 User 1 3. Flexible and assured access control dn Raw data using Attribute-based encryption User 2 User i 7

  8. SD 2 Platform for 1 .Confidentiality Guarantee for Integrated Big Data Utilization kinds of Contents Enc (d1) cloud servers A Enc (d2) Enc (d3) Analysis Enc (dn) Enc(d1) on cloud servers B Enc(d2) Enc(d3) ③ Execute Enc(dn) storage storage ④ Bootstrap No usable library storage to execute data mining ② Encrypt Enc (d1) Enc (x1) Enc (x2) and machine learning. Enc (d2) Enc (x3) Enc (d3) ~ 10 10 slower than d1 Enc (xm) d2 Enc (dn) w/o encryption d3 dn Raw data Noise part invades indispensable part PK: Public Key SK: Secret Key ①Create PK&SK Bootstrapping: 6 s/data Key generation (FHE/ideal lattices): 2.5 s/key (2,200 hour / MB) Key size : 17MB ~ 2GB NOT FIT TO CPU’s cache memory 30 times slower Encryption: 0.2 s/data than encryption 10 3 ~ 10 10 slower than normal operations (7,500 hour / GB ) 8

  9. SD 2 Platform for 2. Assurance of content source and Integrated Big Data Utilization provenance cloud servers A Analysis Enc(d1) σ 1 on cloud servers B Enc(d1) Enc(d2) σ 2 Enc(d2) Enc(d3) σ 3 Enc(d3) ③ Execute Enc(dn) σ n Enc(dn) storage storage ④ Bootstrap storage Enc (x1) σ ‘ 1 ② Encrypt Enc (x2) σ ‘ 2 σ 1 Enc(d1) Enc (x3) σ ‘ 3 Enc(d2) σ 2 d1 Enc(d3) σ 3 Enc (xm) σ ‘ m knowledge d2 knowledge Enc(dn) σ n d3 dn Raw data PK: Public Key 2.5 Create signature σ i SK: Secret Key Using proof of storage Signature size: twice the original data ①Create PK&SK (2TB for 1TB original data) - Requires a large storage space - Re-tagcreation is required depending on a kind of calculation 9

  10. SD 2 Platform for 3. Flexible and assured access control Integrated Big Data Utilization cloud servers A Analysis Enc(d1) σ 1 Enc(d1) on cloud servers Enc(d2) σ 2 Enc(d2) Enc(d3) σ 3 Enc(d3) ③ Execute Enc(dn) σ n Enc(dn) storage storage ④ Bootstrap storage Enc (x1) σ ‘ 1 ② Encrypt Enc (x2) σ ‘ 2 σ 1 Enc(d1) Enc (x3) σ ‘ 3 User X Enc(d2) σ 2 d1 Enc(d3) σ 3 Enc (xm) σ ‘ m knowledge @JaT d2 knowledge Enc(dn) σ n d3 User 1 dn @JST Raw data PK: Public Key User 2 2.5 Create signature σ i @JST SK: Secret Key User 3 ⑤ Flexile Access control @JST ①Create PK&SK Using Attribute-based encryption 10 2 ~ 10 3 speedup is indispensable - handling “numeric number” as it is, not as character 10

  11. SD 2 Platform for 3. Research Goal Integrated Big Data Utilization BASELINE CURRENT FHE, PROOF OF STORAGE, ATTRIBUTE-BASED ENCRYPTION 1,000 TIMES FASTER THAN CURRENT ENCRYPTION METHODS TO SHOW THE EFFECTIVENESS OF OUR PLATFORM WITH EXPERIMENTAL DEMONSTRATION 11

  12. SD 2 Platform for 4. Research Strategy Integrated Big Data Utilization Parallelizaion l (1) For FHE, adopt “Ideal Lattice” whose basic operation is “matrix calculations,” to parallelize Escape Bootstrapping as possible as we can l (2) If SWHE is applicable at some execution, use it 12

  13. SD 2 Platform for 4. Research Strategy Integrated Big Data Utilization Off-load Engine/Stream Processing/Migration l ・ Parallelization & adopt FPGA OUR ORIGINAL ・ Strem-processing called Queue Linker platform OUR ORIGINAL ・ Inter-cloud migration OUR ORIGINAL I/O tuning / optimization l OUR ORIGINAL Cache unfriendly tuning of workload l ・ Effective use of “memory hierarchy” Latency(clock) Bandwidth Registers 1 Adopting a mechanism L1 cache 4+ 330GB/s to bridge the gap. Use Memory Appliance L2 cache 11+ 220GB/s to bridge the gap L3 cache 24+ 110GB/s 10 7 between SSD and HDD DRAM 200-400 10-50GB/s NEW CHALLENGE SSD 350,000 200MB/s HDD 35,000,000+ 600MB/s Data Mining Library based on FHE l 13

  14. SD 2 Platform for 5. Experiment Integrated Big Data Utilization Experimental demonstration à show the effectiveness of our platform Life Log Analysis (sensor data) n Gathering hundreds of thousands users data (raw 1TB data) n > Proof of Storage Analyzing characteristics of human behavior n > verifiable delegation of computation Drug Adverse Analysis (text data) n Gathering over 2 million users’ drug n adverse data and 26 thousand > Proof of Storage medicinal drugs data > Secure multiparty computation with fully homomorphic encryption > verifiable delegation of computation Cooperated with pharmacies, estimate n > attribute based encryption user’s drug adverse 14

  15. SD 2 Platform for 6. Schedule Integrated Big Data Utilization 2015 2016 2017 2018 2019 2020 Mid-term Final Parallelizing by using Outlook evaluation Evaluation “Ideal lattice” base Legal encryption Legal Coordination Guideline Study ● Attribute-based encryption ● fully homomorphic encryption ● Proof of storage Improvement based on LS Encryption Computer Architecture Friendly algorithm Over 30 20 times times faster faster Over 10 3 faster × = Pre-fetch Hierarchy Cont. of ( practical use level ) I/O optimization of Storage FPGA ・ Inter-Cloud Over 30 20 times Parallel & faster times faster Distributed Platform Const. at Computing Waseda&Ochanomizu Improvement Life log Use of platform Over 1,000 logs Over 10,000 logs Experi- menal drug adverse Use of platform Several drugstores 20 drugstores 15

  16. SD 2 Platform for 7. PROGRESS IN 2015FY Integrated Big Data Utilization Legal Study l Studied possible data transfer and analysis under the provision of l 2015 Japanese amendment of Act on the protection of personal Information. Encryption Algorithm l Proposed a theory of FHE for real numbers called FHE4FX. l It enables Homomorphic Greater-Than-bit computation. l Implementation l Implemented “Apriori algorithm,” 10 times faster than the state-of- l the-art method by adopting packing with HElib. Platform l Analyzed I/O performance where data are on outer/inner zone of l platter with large scale data access. Prepared our Cloud Platform between Waseda Univ. and l Ochanomizu Univ. 16

  17. SD 2 Platform for Integrated Big Data Utilization THANK YOU 17

Recommend


More recommend