Introducing MapReduce to High End Computing Grant Mackey, Julio - PowerPoint PPT Presentation

Introducing MapReduce to High End Computing Grant Mackey, Julio Lopez, Saba Sehrish, John Bent, Salman Habib, Jun Wang University of Central Florida, Carnegie Melon University, Los Alamos National Laboratory

Scientific Applications As the computational scale of scientific applications grows, so does the amount of data. Dealing with that amount of data becomes difficult. •Data analytics become difficult •The data becomes too large to move •Applications become resource intensive •More difficult to program for • Do the older existing solutions scale?

Scientific Applications Bioinformatics (Basic Local Alignment Search Tool) •Genomics machines generate large datasets (GB~TB) •Data is manually distributed in parallel through an adhoc job manager script •The method of parallelizing BLAST is conceptually a manual MapReduce operation •Using Hadoop would abstract away the manual parallelization of tasks and would provide task resiliency

Scientific Applications Cyber-Security: Real-time network analysis •In a massively multi-user network environment, petabytes of information can pass of the network in a matter of months •Need a scalable FS that can accommodate the large streaming datasets •Network events are data independent •A programming model that abstracts parallelization from the user is convenient

Scientific Applications Astrophysics: Halo Finding •Current issues •Hadoop Solutions •Ad Hoc: The approach is •Provides a standard unique approach •Too much data movement •No data movement •Parallel halo finding tasks •Hadoop ensures task are unreliable resilience

Halo Finding Method used to find clusters of particles in large astrophysics datasets.

Friends of Friends Algorithm used to perform halo finding

MapReduce model for Halo-Finding HDFS FoF R M

Experiences There is a reason why people think that Hadoop and is only good for data mining applications There exists little to no functionality for data types beyond text Learning curve for the language is steep for applications that deal with different data types such as binary The programmer has to deal with the new programming model and write their own input classes The Hadoop community is very active and incredibly helpful/prompt with responding to issues/bugs

Conclusion Hadoop can be used as a viable resource for large data intensive computing Hadoop runs on an inexpensive commodity computing platform, but provides powerful tools for large scale data analytics The Hadoop architecture provides for task resiliency that other scientific computing methods cannot Hadoop allows for a strict model in which to parallelize a task and the parallelization has been shown to scale to 1000+ node cluster environments (Amazon’s S3 cluster) Hadoop needs more functionality in its API for other data formats

Contact Grant Mackey: gmackey@cs.ucf.edu Julio Lopez: jclopez@andrew.cmu.edu Saba Sehrish: ssehrish@cs.ucf.edu John Bent: johnbent@lanl.gov Jun Wang: jwang@cs.ucf.edu

Introducing MapReduce to High End Computing Grant Mackey, Julio - PowerPoint PPT Presentation

Introducing MapReduce to High End Computing Grant Mackey, Julio Lopez, Saba Sehrish, John Bent, Salman Habib, Jun Wang University of Central Florida, Carnegie Melon University, Los Alamos National Laboratory Scientific Applications As the

Mrs: MapReduce for Scientific Computing in Python Andrew McNabb, Jeff Lund , and Kevin Seppi

MapReduce Andrew Crotty Alex Galakatos What is MapReduce? MapReduce is a framework for:

Cutting MapReduce Cost with Spot Market Huan Liu Accenture Technology Labs Why spot market? 2

Lecture 16: Overview of MapReduce MapReduce is a parallel, distributed programming model and

Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind

MapReduce 320302 Databases & Web Services (P. Baumann) 1 Why MapReduce? Motivation: Large

MapReduce 340151 Big Data & Cloud Services (P. Baumann) 1 Overview MapReduce : the

COMP9313: Big Data Management MapReduce Data Structure in MapReduce Key-value pairs are the

Lecture 36: MapReduce Frameworks [Adapted from slides by John DeNero and MapReduce is a

Laboratory Session: MapReduce Algorithm Design in MapReduce Pietro Michiardi Eurecom Pietro

Introducing more people Introducing more people Introducing more people Introducing more people

Writing reliable end to end tests End to end browser tests They take a long time to run. Around

RESTORE: REUSING RESULTS OF MAPREDUCE JOBS Junjie Hu 1 Introduction Current practice

Flow Analysis Using MapReduce Strengths and Limitations Markus De Shon Sr. Security Engineer

Design Patterns for Efficient Graph Algorithms in MapReduce Algorithms in MapReduce Jimmy Lin and

Counting Triangles and Modeling MapReduce Siddharth Suri Yahoo! Research Outline 2 Modeling

Accelerate Search and Recognition Workloads with SSE 4.2 String and g Text Processing

Using Docker with GPUs Sandra Gesing sandra.gesing@nd.edu

Sequence Analysis Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven

CS481: Bioinformatics Algorithms Can Alkan EA224 calkan@cs.bilkent.edu.tr

Performing Large Science Experiments on Azure: Pitfalls and Solutions Wei Lu, Jared Jackson,

iRODS functionality within the Grassroots Infrastructure Simon Tyrrell, Xingdong Bian and Robert

Performance of Scientific Applications Lonnie D. Crosby, R. Glenn Brook, Bhanu Rekapalli,

Developing and Using Special Developing and Using Special Developing and Using Special Purpose

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Introducing MapReduce to High End Computing Grant Mackey, Julio - PowerPoint PPT Presentation

Introducing MapReduce to High End Computing Grant Mackey, Julio Lopez, Saba Sehrish, John Bent, Salman Habib, Jun Wang University of Central Florida, Carnegie Melon University, Los Alamos National Laboratory Scientific Applications As the

Mrs: MapReduce for Scientific Computing in Python Andrew McNabb, Jeff Lund , and Kevin Seppi

MapReduce Andrew Crotty Alex Galakatos What is MapReduce? MapReduce is a framework for:

Cutting MapReduce Cost with Spot Market Huan Liu Accenture Technology Labs Why spot market? 2

Lecture 16: Overview of MapReduce MapReduce is a parallel, distributed programming model and

Hadoop Map Reduce 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind

MapReduce 320302 Databases &amp; Web Services (P. Baumann) 1 Why MapReduce? Motivation: Large

MapReduce 340151 Big Data &amp; Cloud Services (P. Baumann) 1 Overview MapReduce : the

COMP9313: Big Data Management MapReduce Data Structure in MapReduce Key-value pairs are the

Lecture 36: MapReduce Frameworks [Adapted from slides by John DeNero and MapReduce is a

Laboratory Session: MapReduce Algorithm Design in MapReduce Pietro Michiardi Eurecom Pietro

Introducing more people Introducing more people Introducing more people Introducing more people

Writing reliable end to end tests End to end browser tests They take a long time to run. Around

RESTORE: REUSING RESULTS OF MAPREDUCE JOBS Junjie Hu 1 Introduction Current practice

Flow Analysis Using MapReduce Strengths and Limitations Markus De Shon Sr. Security Engineer

Design Patterns for Efficient Graph Algorithms in MapReduce Algorithms in MapReduce Jimmy Lin and

Counting Triangles and Modeling MapReduce Siddharth Suri Yahoo! Research Outline 2 Modeling

Accelerate Search and Recognition Workloads with SSE 4.2 String and g Text Processing

Using Docker with GPUs Sandra Gesing sandra.gesing@nd.edu

Sequence Analysis Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven

CS481: Bioinformatics Algorithms Can Alkan EA224 calkan@cs.bilkent.edu.tr

Performing Large Science Experiments on Azure: Pitfalls and Solutions Wei Lu, Jared Jackson,

iRODS functionality within the Grassroots Infrastructure Simon Tyrrell, Xingdong Bian and Robert

Performance of Scientific Applications Lonnie D. Crosby, R. Glenn Brook, Bhanu Rekapalli,

Developing and Using Special Developing and Using Special Developing and Using Special Purpose

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

MapReduce 320302 Databases & Web Services (P. Baumann) 1 Why MapReduce? Motivation: Large

MapReduce 340151 Big Data & Cloud Services (P. Baumann) 1 Overview MapReduce : the