MrLazy: Lazy Runtime Label Propagation for MapReduce Sherif Akoush , - PowerPoint PPT Presentation

MrLazy: Lazy Runtime Label Propagation for MapReduce Sherif Akoush , Lucian Carata, Ripduman Sohan, and Andy Hopper HotCloud 2014 June 2014

Motivation

ITV News

Information Flow Control (IFC) • IFC* • Propagate Record + Sensitivity Metadata • Control Information Flow by Checking Metadata against Policies • But… • Many In-House Computations • No Need for Active Checking • Only When Publishing Some Results • Lazy IFC • Track and Use Lineage • Evaluate Output Labels When Needed *J. Bacon, D. Eyers, T. Pasquier, J. Singh, I. Papagiannis, and P. Pietzuch , “Information Flow Control for Secure Cloud Computing,” Network and Service Management, IEEE Transactions on, 2014.

Labels (Metadata) • More than one Label per Record • Different Country Regulations, Data Q uality… • Field-Level • Dynamic Properties • Users Opting In/Out • Sensitivity of Data Expires in 2 Years • New Policies

MapReduce Paradigm DFS DFS (K IN ,V IN ) (K MED ,V MED ) Split 1 64 MB Map (K OUT ,V OUT ) (K MED ,List (V MED )) File1 Reduce Split 2 Shuffle 64 MB Map File2 Reduce Split N 64 MB Map

IFC and MapReduce DFS DFS (K IN ,V IN ) (K MED ,V MED ) l1 b Split 1 Map l6 (K OUT ,V OUT ) (K MED ,List (V MED )) l2 a a 2 File1 Reduce l3 b Split 2 Shuffle Map l7 b 3 File2 Reduce l4 b Split N Map l5 a

Record-Level Lineage for MapReduce DFS DFS (K IN ,V IN ) (K MED ,V MED ) b Split 1 Map (K OUT ,V OUT ) (K MED ,List (V MED )) a a 2 File1 Reduce b Split 2 Shuffle Map b 3 File2 Reduce b Split N Map a

Lazy IFC for MapReduce DFS DFS (K IN ,V IN ) (K MED ,V MED ) qn l1 q1 b Split 1 ƒ(x) Map (K OUT ,V OUT ) (K MED ,List (V MED )) l2 q2 qn a a 2 File1 Reduce l3 q3 qn b Split 2 Shuffle ƒ(x ) Map b 3 File2 Reduce l4 q4 qn b Split N Map l5 q5 qn a

Lineage Capture in Hadoop MapReduce • Record-Level Lineage • No Changes to User Code • Always-On Feature • Treat Lineage for Map and Reduce Tasks Separately • Lineage Reconstruction

Field-Level Enforcement • One Record Can Have Fields With Different Sensitivity • Player Name vs. Passport Number • Field-Level (Conservative) Visibility By Static Analysis map(Text key, Text value ) { String str[] = value .toString ().split(“,”) Text name = new Text ( str[0] ) write( name , 1) }

Prototype Evaluation • Implementation in Hadoop MapReduce • 7-node Cluster • Dataset from BigDataBench: 120 GB • Join and Filter Job

Overheads (Lineage Capture) • Storage Runtime 140% Lineage Reconstruction • 50% of Output 120% • Delete When Not Needed • Trading Space for Time 100% 80% 60% 40% 20% 0% Base With Lineage

Policy 1: Users Opt-out of Data Sharing 120% Naive (Recomputation) • 5% of Users MrLazy 100% 80% 60% 40% 20% 0%

Policy 2: Sensitivity of Data Lasts 2 Years 120% Naive (Recomputation) • Dynamic Behaviour MrLazy 100% 80% 60% 40% 20% 0%

Other Challenges • Dealing with State • In-lining Instructions to Expose State • TopK • Subtle Data Leakage • Differential Privacy

Conclusion • Delay Output Label (Metadata) Computation • Fine-Grained Lineage as Audit Mechanism • Non-Prohibitive Overheads • Future Work: • Reducing Overheads • Large-Scale Evaluation • Recomputation-Based Recovery from Failures

Thanks Sherif.Akoush@cl.cam.ac.uk http://www.cl.cam.ac.uk/~sa497/

MrLazy: Lazy Runtime Label Propagation for MapReduce Sherif Akoush , - PowerPoint PPT Presentation

MrLazy: Lazy Runtime Label Propagation for MapReduce Sherif Akoush , Lucian Carata, Ripduman Sohan, and Andy Hopper HotCloud 2014 June 2014 Motivation ITV News Information Flow Control (IFC) IFC* Propagate Record + Sensitivity Metadata

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

PLANT PROPAGATION An Overview of Plant Propagation Methods Two Techniques of Stem Cutting

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft

Cutting MapReduce Cost with Spot Market Huan Liu Accenture Technology Labs Why spot market? 2

MapReduce Andrew Crotty Alex Galakatos What is MapReduce? MapReduce is a framework for:

Mrs: MapReduce for Scientific Computing in Python Andrew McNabb, Jeff Lund , and Kevin Seppi

Can We Represent Infinite Lists? Lazy Evaluation Amtoft Motivation Lazy Lists Conversions

Imagine for a moment @trentmwillis Lazy Loading Engines: Anything But Lazy Engines allow

Lecture 16: Overview of MapReduce MapReduce is a parallel, distributed programming model and

Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind

MapReduce 320302 Databases & Web Services (P. Baumann) 1 Why MapReduce? Motivation: Large

MapReduce 340151 Big Data & Cloud Services (P. Baumann) 1 Overview MapReduce : the

COMP9313: Big Data Management MapReduce Data Structure in MapReduce Key-value pairs are the

Lecture 36: MapReduce Frameworks [Adapted from slides by John DeNero and MapReduce is a

Laboratory Session: MapReduce Algorithm Design in MapReduce Pietro Michiardi Eurecom Pietro

NGSS PRACTICE: ANALYZING AND INTERPRETING DATA SOUTHERN CT STATE UNIVERSITY ANALYZING DATA JULY

Loops REU Montana State University 2011 Brent Holmes Yohkoh SXT I am using data from the

LGST 299/799: Blockchain, Cryptocurrency, and Distributed Ledger Technology Fall 2018 Tuesday

AF T E R-SCHOOL ACT IVIT IE S OF F E R E D AT L F T SCIENCE ACT IVIT IE S

Presentation List 2020 Elizabeth M. ONeal All presentations are 1-hour in length, unless

Presentation Q1F20 7th August, 2019 V S Parthasarathy Economic ENVIRONMENT Global growth

Radiation Basic Model of a Neutral Atom Electrons(-) orbiting nucleus of protons(+) and

proposal for new upstate New York delivery prices effective July 2021 Public Statement Hearings

Sambuz

Useful Links

Newsletter

Mail Us

MrLazy: Lazy Runtime Label Propagation for MapReduce Sherif Akoush , - PowerPoint PPT Presentation

MrLazy: Lazy Runtime Label Propagation for MapReduce Sherif Akoush , Lucian Carata, Ripduman Sohan, and Andy Hopper HotCloud 2014 June 2014 Motivation ITV News Information Flow Control (IFC) IFC* Propagate Record + Sensitivity Metadata

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

PLANT PROPAGATION An Overview of Plant Propagation Methods Two Techniques of Stem Cutting

Extreme Classification A New Paradigm for Ranking &amp; Recommendation Manik Varma Microsoft

Cutting MapReduce Cost with Spot Market Huan Liu Accenture Technology Labs Why spot market? 2

MapReduce Andrew Crotty Alex Galakatos What is MapReduce? MapReduce is a framework for:

Mrs: MapReduce for Scientific Computing in Python Andrew McNabb, Jeff Lund , and Kevin Seppi

Can We Represent Infinite Lists? Lazy Evaluation Amtoft Motivation Lazy Lists Conversions

Imagine for a moment @trentmwillis Lazy Loading Engines: Anything But Lazy Engines allow

Lecture 16: Overview of MapReduce MapReduce is a parallel, distributed programming model and

Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind

MapReduce 320302 Databases &amp; Web Services (P. Baumann) 1 Why MapReduce? Motivation: Large

MapReduce 340151 Big Data &amp; Cloud Services (P. Baumann) 1 Overview MapReduce : the

COMP9313: Big Data Management MapReduce Data Structure in MapReduce Key-value pairs are the

Lecture 36: MapReduce Frameworks [Adapted from slides by John DeNero and MapReduce is a

Laboratory Session: MapReduce Algorithm Design in MapReduce Pietro Michiardi Eurecom Pietro

NGSS PRACTICE: ANALYZING AND INTERPRETING DATA SOUTHERN CT STATE UNIVERSITY ANALYZING DATA JULY

Loops REU Montana State University 2011 Brent Holmes Yohkoh SXT I am using data from the

LGST 299/799: Blockchain, Cryptocurrency, and Distributed Ledger Technology Fall 2018 Tuesday

AF T E R-SCHOOL ACT IVIT IE S OF F E R E D AT L F T SCIENCE ACT IVIT IE S

Presentation List 2020 Elizabeth M. ONeal All presentations are 1-hour in length, unless

Presentation Q1F20 7th August, 2019 V S Parthasarathy Economic ENVIRONMENT Global growth

Radiation Basic Model of a Neutral Atom Electrons(-) orbiting nucleus of protons(+) and

proposal for new upstate New York delivery prices effective July 2021 Public Statement Hearings

Sambuz

Useful Links

Newsletter

Mail Us

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft

MapReduce 320302 Databases & Web Services (P. Baumann) 1 Why MapReduce? Motivation: Large

MapReduce 340151 Big Data & Cloud Services (P. Baumann) 1 Overview MapReduce : the