Hadoop Map Reduce 01/18/2018 1 MapReduce 2-in-1 A programming - PowerPoint PPT Presentation

Sep 11, 2023 •48 likes •178 views

Hadoop Map Reduce 01/18/2018 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 01/18/2018 2 Overview MR Driver Program

Hadoop Map Reduce 01/18/2018 1
MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 01/18/2018 2
Overview MR Driver Program MR Job Developer Master node Slave nodes 01/18/2018 3
Code Example 01/18/2018 4
Job Execution Overview Driver Job Job Map Shuffle Reduce Cleanup submission preparation 01/18/2018 5
Job Submission Execution location: Driver node A driver machine should have the following Compatible Hadoop binaries Cluster configuration files Network access to the master node Collects job information from the user Input and output paths Map, reduce, and any other functions Any additional user configuration Packages all this in a Hadoop Configuration 01/18/2018 6
Hadoop Configuration Key: String Value: String Input hdfs://user/eldawy/README.txt Output hdfs://user/eldawy/wordcount Mapper edu.ucr.cs.cs226.eldawy.WordCount … Reducer … JAR File User-defined User-defined Serialized over network Master node 01/18/2018 7
Job Preparation Runs on the master node Gets the job ready for parallel execution Collects the JAR file that contains the user- defined functions, e.g., Map and Reduce Writes the JAR and configuration to HDFS to be accessible by the executors Looks at the input file(s) to decide how many map tasks are needed Makes some sanity checks Finally, it pushes the BRB (Big Red Button) 01/18/2018 8
Job Preparation Master node Configuration HDFS InputFormat#getSplits() FileInputSplit Split 1 Mapper 1 Path Split 2 Mapper 2 Start JAR File .. .. End Split M Mapper M 01/18/2018 9
Map Phase Runs in parallel on worker nodes M Mappers: Read the input Apply the map function Apply the combine function (if configured) Store the map output There is no guaranteed ordering for processing the input splits 01/18/2018 10
Map Phase Master node … IS 1 IS 2 IS 3 IS 4 IS 5 IS M 01/18/2018 11
Mapper Reads the job configuration and task information (mostly, InputSplit) Instantiates an object of the Mapper class Instantiates a record reader for the assigned input split Calls Mapper#setup(Context) Reads records one-by-one from the record reader and passes them to the map function The map function writes the output to the context 01/18/2018 12
MapContext Keeps track of which input split is being read and which records are being processed Holds all the job configuration and some additional information about the map task Materializes the map output 01/18/2018 13

Recommend

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop?

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop? Hadoop components Why SAS Data Loader for Hadoop? SAS Data Loader for Hadoop overview Demo Introduction Doug Cutting, creator of Hadoop

285 views • 11 slides

Declarative MapReduce 10/29/2018 1 MapReduce Examples Filter Map Aggregate Map Reduce

Declarative MapReduce 10/29/2018 1 MapReduce Examples Filter Map Aggregate Map Reduce Grouped aggregated Map Reduce Equi-join Map Reduce Map Reduce Non-equi-join 10/29/2018 2 Declarative Languages Describe what you want to do

413 views • 13 slides

Hadoop Scheduling A Hadoop job consists of Map tasks and Reduce tasks Only one job in

Hadoop Scheduling A Hadoop job consists of Map tasks and Reduce tasks Only one job in entire cluster => it occupies cluster Multiple customers with multiple jobs Users/jobs = tenants Multi-tenant system

328 views • 11 slides

Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz

Gideon Zenz Frankfurter Entwicklertag 2014 19.02.2014 Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz Frankfurter Entwicklertag 2014 Agenda Hadoop Intro Map/Reduce

235 views • 19 slides

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow,

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow, Ioannis Paraskevakos, George Chantzialexiou and Shantenu Jha Hadoop on HPC: Integrating Hadoop and Pilot- based Dynamic Resource Management Overview

322 views • 17 slides

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source software framework that Stores big data in a distributed manner Processes big data parallelly Builds on large clusters of commodity hardware.

2.93k views • 60 slides

Recap: Map-Reduce Map Phase Reduce Phase (per record

Map Reduce (contd.) CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 12 : 590.02 Spring 13 1 Recap: Map-Reduce Map Phase Reduce Phase

564 views • 28 slides

Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind

Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 2 Logical View of MapReduce During MapReduce, the

422 views • 23 slides

Using Hadoop for Webscale Computing Ajay Anand Yahoo! aanand@yahoo-inc.com Usenix 2008 Agenda

Using Hadoop for Webscale Computing Ajay Anand Yahoo! aanand@yahoo-inc.com Usenix 2008 Agenda The Problem Solution Approach / Introduction to Hadoop HDFS File System Map Reduce Programming Pig Hadoop

569 views • 43 slides

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part of Apache project. Hadoop Architecture Ambari Ambari offers a Web-based GUI with wizard scripts for setting up clusters with most of the standard

436 views • 18 slides

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com) Personalized Web Big-Data in Yahoo! 3 9/10/13 Hadoop + Spark: Empowered by YARN 30k+ Yahoo! production nodes on YARN since Q1 2013 Shark

421 views • 12 slides

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc. Yahoo! Yahoo! 1 Outline Overview of Hadoop, an open source project Design of HDFS On going work Yahoo! 2 Hadoop Hadoop provides

295 views • 25 slides

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan @Cloudera, Sr. Manager, Compute Platform Apache Hadoop PMC @Cloudera, Apache Hadoop PMC Agenda Hadoop Community Updates & Overview Updates

872 views • 47 slides

Hadoop Jrg Mllenkamp Principal Field Technologist Sun Microsystems Agenda Introduction

Hadoop Jrg Mllenkamp Principal Field Technologist Sun Microsystems Agenda Introduction CMT+Hadoop Solaris+Hadoop Sun Grid Engine+Hadoop Introduction Im ... Jrg Mllenkamp better known as c0t0d0s0.org Sun Employee

1.37k views • 103 slides

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools for leveraging Hadoop from R. MapReduce Spark Hive/Impala Revolution R . . . . . . . . . . . . . . . . . . . . . . . . . .

565 views • 52 slides

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the basics of The Definitive Guide, 3 rd edition MapReduce , lets look at some Hadoop specifics. Note: We will use the new

181 views • 6 slides

Meggy Jr Simple and AVR Plan for today: ATmega328p chip AVR assembly CS453 Lecture Meggy

Meggy Jr Simple and AVR Plan for today: ATmega328p chip AVR assembly CS453 Lecture Meggy Jr Simple and AVR 1 Example Meggy Java program /** * PA2twodots * Lights up two pixels: (col 1,row 2):red (3,4) :white * MMS, 2/5/13

432 views • 9 slides

Creating new Resource Types Track II, Module 7 Fifth GATE Training Course June 2012 2012 The

CREOLE Basics Creating CREOLE Resources Advanced CREOLE Creating new Resource Types Track II, Module 7 Fifth GATE Training Course June 2012 2012 The University of Sheffield c This material is licenced under the Creative Commons

715 views • 59 slides

StaDynA: Addressing the Problem of Dynamic Code Updates in the Security Analysis of Android Apps

StaDynA: Addressing the Problem of Dynamic Code Updates in the Security Analysis of Android Apps Yury Zhauniarovich, MaqsoodAhmad, Olga Gadyatskaya, Bruno Crispo, Fabio Massacci yury.zhauniarovich, maqsood.ahmad, bruno.crispo,

427 views • 24 slides

C-SPARQL:Hands on Session Marco Balduini marco.balduini@polimi.it Share, Remix, Reuse

Stream Reasoning For Linked Data M. Balduini, J-P Calbimonte, O. Corcho, D. Dell'Aglio, E. Della Valle, and J.Z. Pan http://streamreasoning.org/sr4ld2013 C-SPARQL:Hands on Session Marco Balduini marco.balduini@polimi.it Share, Remix, Reuse

463 views • 29 slides

Android SDK Tools in Debian Kai-Chung Yan <seamlikok@gmail.com> Why Android SDK in Debian?

Android SDK Tools in Debian Kai-Chung Yan <seamlikok@gmail.com> Why Android SDK in Debian? The SDK from Google is non-free [1] Why Android SDK in Debian? The SDK from Google is non-free [1] Reproducible SDK & APKs

519 views • 29 slides

Experience Effectively Applied Specification Workshop, Backlog Refinement, Stakeholder

Experience Effectively Applied Specification Workshop, Backlog Refinement, Stakeholder Interaction and Their Flow Into (Unit)Test and Code m i c h a e l . m a i @ v a l t e c h . c o m Valtech. All Right Reserved. Why this talk? Starting

535 views • 22 slides

10-701 Fall 2017 Recitation 2 Yujie, Jessica, Akash Probability Review Theory on basic

10-701 Fall 2017 Recitation 2 Yujie, Jessica, Akash Probability Review Theory on basic probability and expectation Common distributions - discrete Common distributions - continuous Q1: Expectation You are trapped in a dark cave with three

378 views • 35 slides

Automate your workflows with Kotlin Fosdem - 2020 1 Automate your workflows with Kotlin

Automate your workflows with Kotlin Fosdem - 2020 1 Automate your workflows with Kotlin @martinbonnin @mgauzins 2 A daily work... 1. Assign a ticket 2. Create a branch 3. Code... 4. Create a pull request But also... 5. Move

996 views • 32 slides

Hadoop Map Reduce 01/18/2018 1 MapReduce 2-in-1 A programming - PowerPoint PPT Presentation

Hadoop Map Reduce 01/18/2018 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 01/18/2018 2 Overview MR Driver Program

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop?

Declarative MapReduce 10/29/2018 1 MapReduce Examples Filter Map Aggregate Map Reduce

Hadoop Scheduling A Hadoop job consists of Map tasks and Reduce tasks Only one job in

Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow,

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source

Recap: Map-Reduce Map Phase Reduce Phase (per record

Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind

Using Hadoop for Webscale Computing Ajay Anand Yahoo! aanand@yahoo-inc.com Usenix 2008 Agenda

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan

Hadoop Jrg Mllenkamp Principal Field Technologist Sun Microsystems Agenda Introduction

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the

Meggy Jr Simple and AVR Plan for today: ATmega328p chip AVR assembly CS453 Lecture Meggy

Creating new Resource Types Track II, Module 7 Fifth GATE Training Course June 2012 2012 The

StaDynA: Addressing the Problem of Dynamic Code Updates in the Security Analysis of Android Apps

C-SPARQL:Hands on Session Marco Balduini marco.balduini@polimi.it Share, Remix, Reuse

Android SDK Tools in Debian Kai-Chung Yan <seamlikok@gmail.com> Why Android SDK in Debian?

Experience Effectively Applied Specification Workshop, Backlog Refinement, Stakeholder

10-701 Fall 2017 Recitation 2 Yujie, Jessica, Akash Probability Review Theory on basic

Automate your workflows with Kotlin Fosdem - 2020 1 Automate your workflows with Kotlin

Sambuz

Useful Links

Newsletter

Mail Us

Hadoop Map Reduce 01/18/2018 1 MapReduce 2-in-1 A programming - PowerPoint PPT Presentation

Hadoop Map Reduce 01/18/2018 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 01/18/2018 2 Overview MR Driver Program

SAS Data Loader for Hadoop Agenda Intro What is Hadoop? What do I get from Hadoop?

Declarative MapReduce 10/29/2018 1 MapReduce Examples Filter Map Aggregate Map Reduce

Hadoop Scheduling A Hadoop job consists of Map tasks and Reduce tasks Only one job in

Datenanalyse mit Hadoop Quelle: Apache Software Foundation Datenanalyse mit Hadoop Gideon Zenz

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management Andre Luckow,

COMP9313: Big Data Management Hadoop and HDFS Hadoop Apache Hadoop is an open-source

Recap: Map-Reduce Map Phase Reduce Phase (per record

Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind

Using Hadoop for Webscale Computing Ajay Anand Yahoo! aanand@yahoo-inc.com Usenix 2008 Agenda

BY SRIJHA REDDY GANGIDI What is Hadoop ? Evolution of Hadoop Created by dough cutting, a part

Spark and Hadoop at Yahoo: Brought to you by YARN Andy Feng Yahoo! Hadoop (afeng@yahoo-inc.com)

HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.

Apache Hadoop 3.x State of The Union and Upgrade Guidance Wei-Chiu Chuang Wangda Tan

Hadoop Jrg Mllenkamp Principal Field Technologist Sun Microsystems Agenda Introduction

Big Data with R and Hadoop Jamie F Olson June 11, 2015 ; R and Hadoop Review various tools

Working With Hadoop Mostly based on Tom Whites book Hadoop: Now that we covered the

Meggy Jr Simple and AVR Plan for today: ATmega328p chip AVR assembly CS453 Lecture Meggy

Creating new Resource Types Track II, Module 7 Fifth GATE Training Course June 2012 2012 The

StaDynA: Addressing the Problem of Dynamic Code Updates in the Security Analysis of Android Apps

C-SPARQL:Hands on Session Marco Balduini marco.balduini@polimi.it Share, Remix, Reuse

Android SDK Tools in Debian Kai-Chung Yan &lt;seamlikok@gmail.com&gt; Why Android SDK in Debian?

Experience Effectively Applied Specification Workshop, Backlog Refinement, Stakeholder

10-701 Fall 2017 Recitation 2 Yujie, Jessica, Akash Probability Review Theory on basic

Automate your workflows with Kotlin Fosdem - 2020 1 Automate your workflows with Kotlin

Sambuz

Useful Links

Newsletter

Mail Us

Android SDK Tools in Debian Kai-Chung Yan <seamlikok@gmail.com> Why Android SDK in Debian?