mrlazy lazy runtime label propagation for mapreduce
play

MrLazy: Lazy Runtime Label Propagation for MapReduce Sherif Akoush , - PowerPoint PPT Presentation

MrLazy: Lazy Runtime Label Propagation for MapReduce Sherif Akoush , Lucian Carata, Ripduman Sohan, and Andy Hopper HotCloud 2014 June 2014 Motivation ITV News Information Flow Control (IFC) IFC* Propagate Record + Sensitivity Metadata


  1. MrLazy: Lazy Runtime Label Propagation for MapReduce Sherif Akoush , Lucian Carata, Ripduman Sohan, and Andy Hopper HotCloud 2014 June 2014

  2. Motivation

  3. ITV News

  4. Information Flow Control (IFC) • IFC* • Propagate Record + Sensitivity Metadata • Control Information Flow by Checking Metadata against Policies • But… • Many In-House Computations • No Need for Active Checking • Only When Publishing Some Results • Lazy IFC • Track and Use Lineage • Evaluate Output Labels When Needed *J. Bacon, D. Eyers, T. Pasquier, J. Singh, I. Papagiannis, and P. Pietzuch , “Information Flow Control for Secure Cloud Computing,” Network and Service Management, IEEE Transactions on, 2014.

  5. Labels (Metadata) • More than one Label per Record • Different Country Regulations, Data Q uality… • Field-Level • Dynamic Properties • Users Opting In/Out • Sensitivity of Data Expires in 2 Years • New Policies

  6. MapReduce Paradigm DFS DFS (K IN ,V IN ) (K MED ,V MED ) Split 1 64 MB Map (K OUT ,V OUT ) (K MED ,List (V MED )) File1 Reduce Split 2 Shuffle 64 MB Map File2 Reduce Split N 64 MB Map

  7. IFC and MapReduce DFS DFS (K IN ,V IN ) (K MED ,V MED ) l1 b Split 1 Map l6 (K OUT ,V OUT ) (K MED ,List (V MED )) l2 a a 2 File1 Reduce l3 b Split 2 Shuffle Map l7 b 3 File2 Reduce l4 b Split N Map l5 a

  8. Record-Level Lineage for MapReduce DFS DFS (K IN ,V IN ) (K MED ,V MED ) b Split 1 Map (K OUT ,V OUT ) (K MED ,List (V MED )) a a 2 File1 Reduce b Split 2 Shuffle Map b 3 File2 Reduce b Split N Map a

  9. Lazy IFC for MapReduce DFS DFS (K IN ,V IN ) (K MED ,V MED ) qn l1 q1 b Split 1 ƒ(x) Map (K OUT ,V OUT ) (K MED ,List (V MED )) l2 q2 qn a a 2 File1 Reduce l3 q3 qn b Split 2 Shuffle ƒ(x ) Map b 3 File2 Reduce l4 q4 qn b Split N Map l5 q5 qn a

  10. Lineage Capture in Hadoop MapReduce • Record-Level Lineage • No Changes to User Code • Always-On Feature • Treat Lineage for Map and Reduce Tasks Separately • Lineage Reconstruction

  11. Field-Level Enforcement • One Record Can Have Fields With Different Sensitivity • Player Name vs. Passport Number • Field-Level (Conservative) Visibility By Static Analysis map(Text key, Text value ) { String str[] = value .toString ().split(“,”) Text name = new Text ( str[0] ) write( name , 1) }

  12. Prototype Evaluation • Implementation in Hadoop MapReduce • 7-node Cluster • Dataset from BigDataBench: 120 GB • Join and Filter Job

  13. Overheads (Lineage Capture) • Storage Runtime 140% Lineage Reconstruction • 50% of Output 120% • Delete When Not Needed • Trading Space for Time 100% 80% 60% 40% 20% 0% Base With Lineage

  14. Policy 1: Users Opt-out of Data Sharing 120% Naive (Recomputation) • 5% of Users MrLazy 100% 80% 60% 40% 20% 0%

  15. Policy 2: Sensitivity of Data Lasts 2 Years 120% Naive (Recomputation) • Dynamic Behaviour MrLazy 100% 80% 60% 40% 20% 0%

  16. Other Challenges • Dealing with State • In-lining Instructions to Expose State • TopK • Subtle Data Leakage • Differential Privacy

  17. Conclusion • Delay Output Label (Metadata) Computation • Fine-Grained Lineage as Audit Mechanism • Non-Prohibitive Overheads • Future Work: • Reducing Overheads • Large-Scale Evaluation • Recomputation-Based Recovery from Failures

  18. Thanks Sherif.Akoush@cl.cam.ac.uk http://www.cl.cam.ac.uk/~sa497/

Recommend


More recommend