Large-scale Data Mining: MapReduce and beyond Part 1: Basics - PowerPoint PPT Presentation

Example – Programming model mapper employees.txt def getName (line): # LAST FIRST SALARY return line.split(‘\t’)[1] Smith John $90,000 Brown David $70,000 Johnson George $95,000 def addCounts (hist, name): Yates John $80,000 hist[name] = \ Miller Bill $65,000 Moore Jack $85,000 hist.get(name,default=0) + 1 Taylor Fred $75,000 return hist Smith David $80,000 Harris John $90,000 ... ... ... input = open(‘employees.txt’, ‘r’) ... ... ... intermediate = map(getName, input) Q: “What is the frequency of each first name?” 10 Monday, August 23, 2010

Example – Programming model mapper employees.txt def getName (line): # LAST FIRST SALARY return line.split(‘\t’)[1] Smith John $90,000 Brown David $70,000 reducer Johnson George $95,000 def addCounts (hist, name): Yates John $80,000 hist[name] = \ Miller Bill $65,000 Moore Jack $85,000 hist.get(name,default=0) + 1 Taylor Fred $75,000 return hist Smith David $80,000 Harris John $90,000 ... ... ... input = open(‘employees.txt’, ‘r’) ... ... ... intermediate = map(getName, input) result = reduce(addCounts, \ Q: “What is the frequency intermediate, {}) of each first name?” 10 Monday, August 23, 2010

Example – Programming model mapper employees.txt def getName (line): # LAST FIRST SALARY return (line.split(‘\t’)[1], 1) Smith John $90,000 Brown David $70,000 reducer Johnson George $95,000 def addCounts (hist, (name, c)): Yates John $80,000 hist[name] = \ Miller Bill $65,000 Moore Jack $85,000 hist.get(name,default=0) + c Taylor Fred $75,000 return hist Smith David $80,000 Harris John $90,000 ... ... ... input = open(‘employees.txt’, ‘r’) ... ... ... intermediate = map(getName, input) result = reduce(addCounts, \ Q: “What is the frequency intermediate, {}) of each first name?” 11 Monday, August 23, 2010

Example – Programming model mapper employees.txt def getName (line): # LAST FIRST SALARY return (line.split(‘\t’)[1], 1) Smith John $90,000 Brown David $70,000 reducer Johnson George $95,000 def addCounts (hist, (name, c)): Yates John $80,000 hist[name] = \ Miller Bill $65,000 Moore Jack $85,000 hist.get(name,default=0) + c Taylor Fred $75,000 return hist Smith David $80,000 Harris John $90,000 ... ... ... input = open(‘employees.txt’, ‘r’) ... ... ... intermediate = map(getName, input) result = reduce(addCounts, \ Q: “What is the frequency Key-value iterators intermediate, {}) of each first name?” 11 Monday, August 23, 2010

Example – Programming model Hadoop / Java public class HistogramJob extends Configured implements Tool { public static class FieldMapper extends MapReduceBase implements Mapper<LongWritable,Text,Text,LongWritable> { private static LongWritable ONE = new LongWritable(1); private static Text firstname = new Text(); @Override public void map (LongWritable key, Text value, OutputCollector<Text,LongWritable> out, Reporter r) { firstname.set(value.toString().split(“\t”)[1]); output.collect(firstname, ONE); } } // class FieldMapper 12 Monday, August 23, 2010

Example – Programming model Hadoop / Java public class HistogramJob extends Configured implements Tool { public static class FieldMapper extends MapReduceBase implements Mapper<LongWritable,Text,Text,LongWritable> { private static LongWritable ONE = new LongWritable(1); private static Text firstname = new Text(); @Override public void map (LongWritable key, Text value, OutputCollector<Text,LongWritable> out, Reporter r) { firstname.set(value.toString().split(“\t”)[1]); output.collect(firstname, ONE); } non-boilerplate } // class FieldMapper 12 Monday, August 23, 2010

Example – Programming model Hadoop / Java public class HistogramJob extends Configured implements Tool { public static class FieldMapper extends MapReduceBase implements Mapper<LongWritable,Text,Text,LongWritable> { typed… private static LongWritable ONE = new LongWritable(1); private static Text firstname = new Text(); @Override public void map (LongWritable key, Text value, OutputCollector<Text,LongWritable> out, Reporter r) { firstname.set(value.toString().split(“\t”)[1]); output.collect(firstname, ONE); } non-boilerplate } // class FieldMapper 12 Monday, August 23, 2010

Example – Programming model Hadoop / Java public static class LongSumReducer extends MapReduceBase implements Mapper<LongWritable,Text,Text,LongWritable> { private static LongWritable sum = new LongWritable(); @Override public void reduce (Text key, Iterator<LongWritable> vals, OutputCollector<Text,LongWritable> out, Reporter r) { long s = 0; while (vals.hasNext()) s += vals.next().get(); sum.set(s); output.collect(key, sum); } } // class LongSumReducer 13 Monday, August 23, 2010

Example – Programming model Hadoop / Java public int run (String[] args) throws Exception { JobConf job = new JobConf(getConf(), HistogramJob.class); job.setJobName(“Histogram”); FileInputFormat.setInputPaths(job, args[0]); job.setMapperClass(FieldMapper.class); job.setCombinerClass(LongSumReducer.class); job.setReducerClass(LongSumReducer.class); // ... JobClient. runJob (job); return 0; } // run() public static main (String[] args) throws Exception { ToolRunner. run (new Configuration(), new HistogramJob(), args); } // main() } // class HistogramJob 14 Monday, August 23, 2010

Example – Programming model Hadoop / Java public int run (String[] args) throws Exception { JobConf job = new JobConf(getConf(), HistogramJob.class); job.setJobName(“Histogram”); FileInputFormat.setInputPaths(job, args[0]); job.setMapperClass(FieldMapper.class); job.setCombinerClass(LongSumReducer.class); job.setReducerClass(LongSumReducer.class); // ... JobClient. runJob (job); return 0; } // run() public static main (String[] args) throws Exception { ToolRunner. run (new Configuration(), new HistogramJob(), args); } // main() } // class HistogramJob ~ 30 lines = 25 boilerplate (Eclipse) + 5 actual code 14 Monday, August 23, 2010

MapReduce for…  Distributed clusters  Google’s original  Hadoop (Apache Software Foundation)  Hardware  SMP/CMP: Phoenix (Stanford)  Cell BE  Other  Skynet (in Ruby/DRB)  QtConcurrent  BashReduce  …many more 15 Monday, August 23, 2010

Recap Quick-n-dirty script Hadoop vs ~5 lines of (non-bo on-boilerplate) code Single machine, Up to thousands of local drive machines and drives 16 Monday, August 23, 2010

Recap Quick-n-dirty script Hadoop vs ~5 lines of (non-bo on-boilerplate) code Single machine, Up to thousands of local drive machines and drives What is hidden to achieve this:  Data partitioning, placement and replication  Computation placement (and replication)  Number of nodes (mappers / reducers)  … 16 Monday, August 23, 2010

Recap Quick-n-dirty script Hadoop vs ~5 lines of (non-bo on-boilerplate) code Single machine, Up to thousands of local drive machines and drives What is hidden to achieve this:  Data partitioning, placement and replication  Computation placement (and replication)  Number of nodes (mappers / reducers)  … As a programmer, you don’t need to know what I’m about to show you next… 16 Monday, August 23, 2010

Execution model: Flow Input file SPLIT 0 MAPPER Output file REDUCER SPLIT 1 PART 0 MAPPER REDUCER SPLIT 2 PART 1 MAPPER SPLIT 3 MAPPER 17 Monday, August 23, 2010

Execution model: Flow Key/value Input file iterators SPLIT 0 MAPPER Output file REDUCER SPLIT 1 PART 0 MAPPER REDUCER SPLIT 2 PART 1 MAPPER SPLIT 3 MAPPER 17 Monday, August 23, 2010

Execution model: Flow Key/value Input file iterators SPLIT 0 MAPPER Output file REDUCER SPLIT 1 PART 0 MAPPER REDUCER SPLIT 2 PART 1 MAPPER SPLIT 3 MAPPER Sequential scan 17 Monday, August 23, 2010

Execution model: Flow Key/value Input file iterators Smith John $90,000 SPLIT 0 MAPPER Output file REDUCER SPLIT 1 PART 0 MAPPER REDUCER SPLIT 2 Yates John $80,000 PART 1 MAPPER SPLIT 3 MAPPER Sequential scan 17 Monday, August 23, 2010

Execution model: Flow Key/value Input file iterators John 1 SPLIT 0 MAPPER Output file REDUCER SPLIT 1 PART 0 MAPPER REDUCER SPLIT 2 John 1 PART 1 MAPPER SPLIT 3 MAPPER Sequential scan 17 Monday, August 23, 2010

Execution model: Flow Key/value Input file iterators John 1 SPLIT 0 MAPPER Output file REDUCER SPLIT 1 PART 0 MAPPER REDUCER SPLIT 2 John 1 PART 1 MAPPER SPLIT 3 MAPPER All-to-all, hash partitioning Sequential scan 17 Monday, August 23, 2010

Execution model: Flow Key/value Input file iterators SPLIT 0 MAPPER Output file John 2 REDUCER SPLIT 1 PART 0 MAPPER REDUCER SPLIT 2 PART 1 MAPPER SPLIT 3 MAPPER Sort-merge All-to-all, hash partitioning Sequential scan 17 Monday, August 23, 2010

Execution model: Flow Key/value Input file iterators SPLIT 0 MAPPER Output file REDUCER John 2 SPLIT 1 PART 0 MAPPER REDUCER SPLIT 2 PART 1 MAPPER SPLIT 3 MAPPER Sort-merge All-to-all, hash partitioning Sequential scan 17 Monday, August 23, 2010

Execution model: Placement HOST 1 HOST 2 SPLIT 0 SPLIT 4 SPLIT 3 SPLIT 2 Replica 2/3 Replica 1/3 Replica 3/3 Replica 2/3 HOST 3 SPLIT 3 HOST 0 SPLIT 0 Replica 1/3 SPLIT 2 SPLIT 1 Replica 3/3 SPLIT 0 SPLIT 1 Replica 3/3 Replica 1/3 Replica 1/3 Replica 2/3 SPLIT 4 SPLIT 3 Replica 2/3 Replica 2/3 HOST 5 HOST 4 HOST 6 18 Monday, August 23, 2010

Execution model: Placement HOST 1 HOST 2 SPLIT 0 SPLIT 4 SPLIT 3 SPLIT 2 Replica 2/3 Replica 1/3 Replica 3/3 Replica 2/3 MAPPER MAPPER HOST 3 SPLIT 3 HOST 0 SPLIT 0 Replica 1/3 SPLIT 2 SPLIT 1 Replica 3/3 SPLIT 0 SPLIT 1 Replica 3/3 Replica 1/3 Replica 1/3 Replica 2/3 MAPPER MAPPER SPLIT 4 SPLIT 3 Replica 2/3 Replica 2/3 HOST 5 HOST 4 HOST 6 Computation co-located with data (as much as possible) 18 Monday, August 23, 2010

Execution model: Placement HOST 1 HOST 2 SPLIT 0 SPLIT 4 SPLIT 3 SPLIT 2 Replica 2/3 Replica 1/3 Replica 3/3 Replica 2/3 MAPPER MAPPER HOST 3 SPLIT 3 HOST 0 SPLIT 0 Replica 1/3 SPLIT 2 SPLIT 1 Replica 3/3 SPLIT 0 SPLIT 1 Replica 3/3 Replica 1/3 Replica 1/3 Replica 2/3 MAPPER MAPPER SPLIT 4 SPLIT 3 Replica 2/3 Replica 2/3 HOST 5 HOST 4 HOST 6 19 Monday, August 23, 2010

Execution model: Placement HOST 1 HOST 2 SPLIT 0 SPLIT 4 SPLIT 3 SPLIT 2 Replica 2/3 Replica 1/3 Replica 3/3 Replica 2/3 MAPPER MAPPER HOST 3 SPLIT 3 HOST 0 SPLIT 0 REDUCER Replica 1/3 SPLIT 2 SPLIT 1 Replica 3/3 SPLIT 0 SPLIT 1 Replica 3/3 Replica 1/3 Replica 1/3 Replica 2/3 MAPPER MAPPER SPLIT 4 SPLIT 3 Replica 2/3 Replica 2/3 HOST 5 HOST 4 HOST 6 19 Monday, August 23, 2010

Execution model: Placement HOST 1 HOST 2 SPLIT 0 SPLIT 4 SPLIT 3 SPLIT 2 Replica 2/3 Replica 1/3 Replica 3/3 Replica 2/3 MAPPER MAPPER HOST 3 SPLIT 3 HOST 0 SPLIT 0 REDUCER Replica 1/3 SPLIT 2 SPLIT 1 Replica 3/3 SPLIT 0 SPLIT 1 Replica 3/3 Replica 1/3 Replica 1/3 Replica 2/3 MAPPER MAPPER SPLIT 4 SPLIT 3 Replica 2/3 Replica 2/3 HOST 5 HOST 4 HOST 6 Rack/network-aware 19 Monday, August 23, 2010

Execution model: Placement HOST 1 HOST 2 SPLIT 0 SPLIT 4 SPLIT 3 SPLIT 2 Replica 2/3 Replica 1/3 Replica 3/3 Replica 2/3 MAPPER MAPPER HOST 3 C C SPLIT 3 HOST 0 SPLIT 0 REDUCER Replica 1/3 SPLIT 2 SPLIT 1 Replica 3/3 SPLIT 0 SPLIT 1 Replica 3/3 Replica 1/3 Replica 1/3 Replica 2/3 MAPPER C MAPPER SPLIT 4 C SPLIT 3 Replica 2/3 Replica 2/3 HOST 5 HOST 4 HOST 6 C COMBINER Rack/network-aware 19 Monday, August 23, 2010

MapReduce Summary 20 Monday, August 23, 2010

MapReduce Summary  Simple programming model  Scalable, fault-tolerant  Ideal for (pre-)processing large volumes of data 20 Monday, August 23, 2010

MapReduce Summary  Simple programming model  Scalable, fault-tolerant  Ideal for (pre-)processing large volumes of data ‘However, if the data center is the computer, it leads to the even more intriguing question “What is the equivalent of the ADD instruction for a data center?” […] If MapReduce is the first instruction of the “data center computer” , I can’t wait to see the rest of the instruction set, as well as the data center programming language, the data center operating system, the data center storage systems, and more.’ – David Patterson, “ The Data Center Is The Computer ”, CACM, Jan. 2008 20 Monday, August 23, 2010

Outline  Introduction  MapReduce & distributed storage  Hadoop  HBase  Pig  Cascading  Hive  Summary 21 Monday, August 23, 2010

Hadoop HBase Pig Hive Chukwa Zoo MapReduce HDFS Keeper Core Avro 22 Monday, August 23, 2010

Hadoop HBase Pig Hive Chukwa Zoo MapReduce HDFS Keeper Hadoop’s stated mission (Doug Cutting interview): Core Avro Commoditize infrastructure for web-scale, data-intensive applications 22 Monday, August 23, 2010

Who uses Hadoop?  Yahoo!  Facebook  Last.fm  Rackspace  Digg  Apache Nutch  … more in part 3 23 Monday, August 23, 2010

Hadoop HBase Pig Hive Chukwa Filesystems and I/O:  Abstraction APIs Zoo MapReduce HDFS  RPC / Persistence Keeper Core Avro 24 Monday, August 23, 2010

Hadoop HBase Pig Hive Chukwa Cross-language serialization:  RPC / persistence Zoo MapReduce HDFS  ~ Google ProtoBuf / FB Thrift Keeper Core Avro 25 Monday, August 23, 2010

Hadoop Distributed execution (batch)  Programming model HBase Pig Hive Chukwa  Scalability / fault-tolerance Zoo MapReduce HDFS Keeper Core Avro 26 Monday, August 23, 2010

Hadoop Distributed storage (read-opt.)  Replication / scalability HBase Pig Hive Chukwa  ~ Google filesystem (GFS) Zoo MapReduce HDFS Keeper Core Avro 27 Monday, August 23, 2010

Hadoop Coordination service  Locking / configuration HBase Pig Hive Chukwa  ~ Google Chubby Zoo MapReduce HDFS Keeper Core Avro 28 Monday, August 23, 2010

Hadoop HBase Pig Hive Chukwa Column-oriented, sparse store Zoo  Batch & random access MapReduce HDFS Keeper  ~ Google BigTable Core Avro 29 Monday, August 23, 2010

Hadoop HBase Pig Hive Chukwa Data flow language Zoo  Procedural SQL-inspired lang. MapReduce HDFS Keeper  Execution environment Core Avro 30 Monday, August 23, 2010

Hadoop HBase Pig Hive Chukwa Distributed data warehouse Zoo  SQL-like query language MapReduce HDFS Keeper  Data mgmt / query execution Core Avro 31 Monday, August 23, 2010

Hadoop … … more HBase Pig Hive Chukwa Zoo MapReduce HDFS Keeper Core Avro 32 Monday, August 23, 2010

MapReduce  Mapper: (k1, v1)  (k2, v2)[]  E.g., (void, textline : string)  (first : string, count : int)  Reducer: (k2, v2[])  (k3, v3)[]  E.g., (first : string, counts : int[])  (first : string, total : int)  Combiner: (k2, v2[])  (k2, v2)[]  Partition: (k2, v2)  int 33 Monday, August 23, 2010

Mapper interface interface Mapper<K1, V1, K2, V2> {  void configure (JobConf conf); void map (K1 key, V1 value,  OutputCollector<K2, V2> out, Reporter reporter); void close();  }  Initialize in configure()  Clean-up in close()  Emit via out.collect(key,val) any time 34 Monday, August 23, 2010

Reducer interface interface Reducer<K2, V2, K3, V3> {  void configure (JobConf conf); void reduce (  K2 key, Iterator<V2> values, OutputCollector<K3, V3> out, Reporter reporter);  void close(); }  Initialize in configure()  Clean-up in close()  Emit via out.collect(key,val) any time 35 Monday, August 23, 2010

Some canonical examples  Histogram-type jobs:  Graph construction (bucket = edge)  K-means et al. (bucket = cluster center)  Inverted index:  Text indices  Matrix transpose  Sorting  Equi-join  More details in part 2 36 Monday, August 23, 2010

Equi-joins “Reduce-side” (Smith, 7) MAP (Jones, 7) (Brown, 7) (Davis, 3) (Dukes, 5) (Black, 3) (Gruhl, 7) (Sales, 3) (Devel, 7) MAP (Acct., 5) 37 Monday, August 23, 2010

Equi-joins “Reduce-side” (Smith, 7) 7: (  , (Smith)) MAP (Jones, 7) (Brown, 7) (Davis, 3) (Dukes, 5) (Black, 3) (Gruhl, 7) (Sales, 3) (Devel, 7) MAP (Acct., 5) 37 Monday, August 23, 2010

Equi-joins “Reduce-side” (Smith, 7) 7: (  , (Smith)) MAP (Jones, 7) (Brown, 7) (Davis, 3) (Dukes, 5) (Black, 3) (Gruhl, 7) (Sales, 3) (Devel, 7) 7: (  , (Devel)) MAP (Acct., 5) 37 Monday, August 23, 2010

Equi-joins “Reduce-side” (Smith, 7) 7: (  , (Smith)) MAP -OR- (Jones, 7) (7,  ): (Smith) (Brown, 7) (Davis, 3) (Dukes, 5) (Black, 3) (Gruhl, 7) (Sales, 3) (Devel, 7) 7: (  , (Devel)) MAP -OR- (Acct., 5) (7,  ): (Devel) 37 Monday, August 23, 2010

Equi-joins “Reduce-side” (Smith, 7) MAP 7: (  , (Smith)) (Jones, 7) 7: (  , (Jones)) (Brown, 7) 7: (  , (Brown)) (Davis, 3) (Dukes, 5) (Black, 3) (Gruhl, 7) 7: (  , (Gruhl)) (Sales, 3) (Devel, 7) 7: (  , (Devel)) MAP (Acct., 5) 38 Monday, August 23, 2010

Equi-joins “Reduce-side” (Smith, 7) MAP 7: (  , (Smith)) (Jones, 7) 7: (  , (Jones)) 7: {(  , (Smith)), (Brown, 7) 7: (  , (Brown)) (  , (Jones)), (Davis, 3) (  , (Brown)), SHUF. (  , (Gruhl)), (Dukes, 5) (  , (Devel)) } (Black, 3) (Gruhl, 7) 7: (  , (Gruhl)) (Sales, 3) (Devel, 7) 7: (  , (Devel)) MAP (Acct., 5) 38 Monday, August 23, 2010

Equi-joins “Reduce-side” (Smith, 7) MAP 7: (  , (Smith)) (Jones, 7) 7: (  , (Jones)) 7: {(  , (Smith)), (Brown, 7) 7: (  , (Brown)) (  , (Jones)), (Davis, 3) (  , (Brown)), SHUF. (  , (Gruhl)), (Dukes, 5) (  , (Devel)) } (Black, 3) RED. (Gruhl, 7) 7: (  , (Gruhl)) (Smith, Devel), (Sales, 3) (Jones, Devel), (Brown, Devel), (Devel, 7) 7: (  , (Devel)) MAP (Gruhl, Devel) (Acct., 5) 38 Monday, August 23, 2010

HDFS & MapReduce processes HOST 1 HOST 1 HOST 2 HOST 2 SPLIT 0 SPLIT 4 SPLIT 3 SPLIT 2 Replica 2/3 Replica 1/3 Replica 3/3 Replica 2/3 MAPPER MAPPER HOST 3 HOST 3 C C SPLIT 3 HOST 0 HOST 0 SPLIT 0 REDUCER Replica 1/3 SPLIT 2 SPLIT 1 Replica 3/3 SPLIT 0 SPLIT 1 Replica 3/3 Replica 1/3 Replica 1/3 Replica 2/3 MAPPER C MAPPER SPLIT 4 C SPLIT 3 Replica 2/3 Replica 2/3 HOST 5 HOST 4 HOST 6 39 Monday, August 23, 2010

HDFS & MapReduce processes HOST 1 HOST 1 DATA HOST 2 HOST 2 DATA NODE SPLIT 0 SPLIT 4 NODE SPLIT 3 SPLIT 2 Replica 2/3 Replica 1/3 Replica 3/3 Replica 2/3 MAPPER MAPPER HOST 3 HOST 3 C DATA C SPLIT 3 HOST 0 HOST 0 NODE DATA SPLIT 0 REDUCER Replica 1/3 SPLIT 2 SPLIT 1 Replica 3/3 NODE SPLIT 0 SPLIT 1 Replica 3/3 Replica 1/3 Replica 1/3 Replica 2/3 MAPPER C MAPPER SPLIT 4 C SPLIT 3 Replica 2/3 Replica 2/3 HOST 5 DN HOST 4 HOST 6 DN DN 39 Monday, August 23, 2010

HDFS & MapReduce processes HOST 1 HOST 1 DATA HOST 2 HOST 2 DATA NODE SPLIT 0 SPLIT 4 NODE SPLIT 3 SPLIT 2 Replica 2/3 Replica 1/3 Replica 3/3 Replica 2/3 MAPPER MAPPER HOST 3 HOST 3 C DATA C SPLIT 3 HOST 0 HOST 0 NODE DATA SPLIT 0 REDUCER Replica 1/3 SPLIT 2 SPLIT 1 Replica 3/3 NODE SPLIT 0 SPLIT 1 Replica 3/3 Replica 1/3 Replica 1/3 Replica 2/3 MAPPER C MAPPER SPLIT 4 C SPLIT 3 Replica 2/3 Replica 2/3 NAME HOST 5 DN NODE HOST 4 HOST 6 DN DN 39 Monday, August 23, 2010

Large-scale Data Mining: MapReduce and beyond Part 1: Basics - PowerPoint PPT Presentation

Large-scale Data Mining: MapReduce and beyond Part 1: Basics Spiros Papadimitriou, Google Jimeng Sun, IBM Research Rong Yan, Facebook Monday, August 23, 2010 Data everywhere 2 Monday, August 23, 2010 Data everywhere Flickr (3 billion

Large-scale Data Mining: MapReduce and Beyond Part 2: Algorithms Spiros Papadimitriou, IBM

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

MapReduce 320302 Databases & Web Services (P. Baumann) 1 Why MapReduce? Motivation: Large

MapReduce Andrew Crotty Alex Galakatos What is MapReduce? MapReduce is a framework for:

Mrs: MapReduce for Scientific Computing in Python Andrew McNabb, Jeff Lund , and Kevin Seppi

Cutting MapReduce Cost with Spot Market Huan Liu Accenture Technology Labs Why spot market? 2

COMP9313: Big Data Management MapReduce Data Structure in MapReduce Key-value pairs are the

Lecture 16: Overview of MapReduce MapReduce is a parallel, distributed programming model and

Large-Scale Data Engineering Frameworks Beyond MapReduce event.cwi.nl/lsde THE HADOOP ECOSYSTEM

MapReduce and Frequent Itemsets Mining Yang Wang 1 MapReduce (Hadoop) Programming model

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Phoenix Rebirth: Scalable MapReduce on a Large-Scale Shared-Memory System Richard Yoo, Anthony

MapReduce and Beyond Part 3: Applications Spiros Papadimitriou, IBM Research Jimeng Sun, IBM

MapReduce 340151 Big Data & Cloud Services (P. Baumann) 1 Overview MapReduce : the

Motivation Large-Scale Data Processing MapReduce: Want to use 1000s of CPUs Simplified

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

The Center of the Premium Video Economy. AT THE HEART OF PREMIUM VIDEO ECONOMY PROVIDING

End-to-End Delay Guarantees for Real-Time Systems using SDN Rakesh Kumar , Monowar Hasan, Smruti

Open. Scalable. Intelligent? Free Mind Unstructured Open Too Source Ended For Business

Data-Intensive Distributed Computing CS 431/631 451/651 (Fall 2019) Part 1: MapReduce Algorithm

A Scourge is Born The History of the Microcomputer -- Invention and Evolution Intel 1968 -

[Primer] The FCC (mostly) (de)regulates services, not networks Network vs. service Service:

Export 101 Webinar October 21, 2014 Rachid S ayouty, Director US Commercial S ervice Los

The Tenth Circuit Appeal Focusing on FCC Authority to Regulate Information Services Provided By

Large-scale Data Mining: MapReduce and beyond Part 1: Basics - PowerPoint PPT Presentation

Large-scale Data Mining: MapReduce and beyond Part 1: Basics Spiros Papadimitriou, Google Jimeng Sun, IBM Research Rong Yan, Facebook Monday, August 23, 2010 Data everywhere 2 Monday, August 23, 2010 Data everywhere Flickr (3 billion

Large-scale Data Mining: MapReduce and Beyond Part 2: Algorithms Spiros Papadimitriou, IBM

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

MapReduce 320302 Databases &amp; Web Services (P. Baumann) 1 Why MapReduce? Motivation: Large

MapReduce Andrew Crotty Alex Galakatos What is MapReduce? MapReduce is a framework for:

Mrs: MapReduce for Scientific Computing in Python Andrew McNabb, Jeff Lund , and Kevin Seppi

Cutting MapReduce Cost with Spot Market Huan Liu Accenture Technology Labs Why spot market? 2

COMP9313: Big Data Management MapReduce Data Structure in MapReduce Key-value pairs are the

Lecture 16: Overview of MapReduce MapReduce is a parallel, distributed programming model and

Large-Scale Data Engineering Frameworks Beyond MapReduce event.cwi.nl/lsde THE HADOOP ECOSYSTEM

MapReduce and Frequent Itemsets Mining Yang Wang 1 MapReduce (Hadoop) Programming model

Introduction What is data mining? to Data Mining: On what kind of data? Data Mining

Phoenix Rebirth: Scalable MapReduce on a Large-Scale Shared-Memory System Richard Yoo, Anthony

MapReduce and Beyond Part 3: Applications Spiros Papadimitriou, IBM Research Jimeng Sun, IBM

MapReduce 340151 Big Data &amp; Cloud Services (P. Baumann) 1 Overview MapReduce : the

Motivation Large-Scale Data Processing MapReduce: Want to use 1000s of CPUs Simplified

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

The Center of the Premium Video Economy. AT THE HEART OF PREMIUM VIDEO ECONOMY PROVIDING

End-to-End Delay Guarantees for Real-Time Systems using SDN Rakesh Kumar , Monowar Hasan, Smruti

Open. Scalable. Intelligent? Free Mind Unstructured Open Too Source Ended For Business

Data-Intensive Distributed Computing CS 431/631 451/651 (Fall 2019) Part 1: MapReduce Algorithm

A Scourge is Born The History of the Microcomputer -- Invention and Evolution Intel 1968 -

[Primer] The FCC (mostly) (de)regulates services, not networks Network vs. service Service:

Export 101 Webinar October 21, 2014 Rachid S ayouty, Director US Commercial S ervice Los

The Tenth Circuit Appeal Focusing on FCC Authority to Regulate Information Services Provided By

MapReduce 320302 Databases & Web Services (P. Baumann) 1 Why MapReduce? Motivation: Large

MapReduce 340151 Big Data & Cloud Services (P. Baumann) 1 Overview MapReduce : the