Monitoring I/O on Data-Intensive Clusters Visualizing Disk Reads and - PowerPoint PPT Presentation

Monitoring I/O on Data-Intensive Clusters Visualizing Disk Reads and Writes on Hadoop MapReduce Jobs Thursday, July 31 Joel Ornstein Joshua Long Carson Wiens Mentors: Steve Senator, Tim Randles, Vaughan Clinton, Mike Mason, Graham Van Heule – HPC 3 � ¡ 1 ¡ LA-‑UR-‑14-‑26019 ¡

Background Motivation: – I/O Intensive Jobs • Large amounts of scientific data 2 ¡

Background Motivation: – I/O Intensive Jobs • Large amounts of scientific data Traditional HPC – Limiting factor mostly lies in processing speed 2 ¡

Background Motivation: – I/O Intensive Jobs • Large amounts of scientific data Traditional HPC – Limiting factor mostly lies in processing speed I/O Intensive Jobs – Bottlenecked by read/write disk speed – MapReduce • Move jobs to the data (instead of vice-versa) 2 ¡

MapReduce �� 3 ¡

I/O Monitoring Why? – Nodes break – Jobs run without using the specified resources 4 ¡

I/O Monitoring Why? – Nodes break – Jobs run without using the specified resources Deliverables – Programs that are helpful for monitoring a Hadoop 2.3 cluster • Splunk App for HadoopOps • Ganglia • Other methods 4 ¡

I/O Monitoring Why? – Nodes break – Jobs run without using the specified resources Deliverables – Programs that are helpful for monitoring a Hadoop 2.3 cluster • Splunk App for HadoopOps • Ganglia • Other methods – Data tests • bonnie++ • teragen and terasort 4 ¡

Environment • 11-node CentOS cluster – 1 head node and 10 compute nodes • FDR InfiniBand 56-Gb/second – IP over IB – Faster than disks can read/write • Hadoop 2.3.0 • MRv2/YARN – Yet Another Resource Negotiator – Runs MapReduce jobs in Hadoop environment • Java 1.6 5 ¡

Monitoring Tools Splunk – software for searching and analyzing logs – able to generate graphs, charts, gauges, etc. – web interface 6 ¡

Monitoring Tools Splunk – software for searching and analyzing logs – able to generate graphs, charts, gauges, etc. – web interface Ganglia – software for monitoring clusters – generates plots from input – web interface 6 ¡

Monitoring Tools Splunk – software for searching and analyzing logs – able to generate graphs, charts, gauges, etc. – web interface Ganglia – software for monitoring clusters – generates plots from input – web interface iostat – outputs I/O statistics for devices – command-line interface 6 ¡

Splunk App for HadoopOps 7 ¡

Ganglia 8 ¡

iostat iostat –kxy 1 2 9 ¡

iostat iostat –kxy 1 2 kB ¡read ¡per ¡second ¡ 9 ¡

iostat iostat –kxy 1 2 kB ¡wri>en ¡per ¡second ¡ kB ¡read ¡per ¡second ¡ 9 ¡

Methods Benchmarking – bonnie++ – measure disk I/O Hadoop jobs – teragen – terasort Hadoop jobs with remote data 10 ¡

Monitoring I/O on Data-Intensive Clusters Visualizing Disk Reads and - PowerPoint PPT Presentation

Monitoring I/O on Data-Intensive Clusters Visualizing Disk Reads and Writes on Hadoop MapReduce Jobs Thursday, July 31 Joel Ornstein Joshua Long Carson Wiens Mentors: Steve Senator, Tim Randles, Vaughan Clinton, Mike Mason, Graham Van Heule

I nternational research The evidence on clusters is clear Firms located in clusters are more

Internet Server Clusters Internet Server Clusters Jeff Chase Duke University, Department of

Data-Intensive Workfmows A journey to a Holistjc Framework for Data-Intensive Workfmows Ian

MapReduce Data Intensive Computing Data-intensive computing is a class of parallel

Intensive Family Support Project Katherine Manchester Paula Hill What is the Intensive Family

Locational narratives in creative clusters An exploration of place, reputation and creative

Dynamic Virtual Clusters in a Grid Dynamic Virtual Clusters in a Grid Site Manager Site Manager

Enabling Enabling Data- -Intensive Science Intensive Science Data with Tactical Storage

and Observational Science The Convergence of Data-Intensive and Compute-Intensive Infrastructure

Turning Data Into Business Value Qwertee 101: Finding Your Next Data Partner Data-Intensive

Data-Intensive Applications on Numerically-Intensive Supercomputers David Daniel / James Ahrens

for Data Intensive Scalable Computing CAP3 Gene Assembly Program Compute intensive

2016 Coordinated Monitoring Schedule 1 Navigation of Coordinated Monitoring website

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

EAFR: An Energy-Efficient Adaptive File Replication System In Data-Intensive Clusters Yuhua Lin

Data-Intensive Research in Education: NSF Initiatives in Big Data and Data Science Chris

Performance Monitoring with NKN October 25, 2013 Amit Kumar amit.kr@nkn.in Table of Content

T he T ransportation Ce nte r Ha ni S. Ma hma ssa ni Pre se nta tio n to the BAC Ma y 3,

Agenda Inaugural Session Registration & Welcome Coffee 7:30 - 8:30 Welcome Address - SCLG

CONNECTING THE WORLD Silicon | Systems | IoT / @MosChipT ech 2 A GIMPSE OF / @MosChipT ech

The Cost of Transmission for Wind Energy: A Review of Transmission Planning Studies Andrew

Adding Disruption Tolerant Networking to UnetStack Arnav Dhamija Acoustic Research Laboratory,

Southwest Power Pool Sub-Regional Planning Meeting Sub-Regional Area 3 David Sargent May 12,

Overview of the Generation- - Overview of the Generation Transmission Maximization

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Monitoring I/O on Data-Intensive Clusters Visualizing Disk Reads and - PowerPoint PPT Presentation

Monitoring I/O on Data-Intensive Clusters Visualizing Disk Reads and Writes on Hadoop MapReduce Jobs Thursday, July 31 Joel Ornstein Joshua Long Carson Wiens Mentors: Steve Senator, Tim Randles, Vaughan Clinton, Mike Mason, Graham Van Heule

I nternational research The evidence on clusters is clear Firms located in clusters are more

Internet Server Clusters Internet Server Clusters Jeff Chase Duke University, Department of

Data-Intensive Workfmows A journey to a Holistjc Framework for Data-Intensive Workfmows Ian

MapReduce Data Intensive Computing Data-intensive computing is a class of parallel

Intensive Family Support Project Katherine Manchester Paula Hill What is the Intensive Family

Locational narratives in creative clusters An exploration of place, reputation and creative

Dynamic Virtual Clusters in a Grid Dynamic Virtual Clusters in a Grid Site Manager Site Manager

Enabling Enabling Data- -Intensive Science Intensive Science Data with Tactical Storage

and Observational Science The Convergence of Data-Intensive and Compute-Intensive Infrastructure

Turning Data Into Business Value Qwertee 101: Finding Your Next Data Partner Data-Intensive

Data-Intensive Applications on Numerically-Intensive Supercomputers David Daniel / James Ahrens

for Data Intensive Scalable Computing CAP3 Gene Assembly Program Compute intensive

2016 Coordinated Monitoring Schedule 1 Navigation of Coordinated Monitoring website

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

EAFR: An Energy-Efficient Adaptive File Replication System In Data-Intensive Clusters Yuhua Lin

Data-Intensive Research in Education: NSF Initiatives in Big Data and Data Science Chris

Performance Monitoring with NKN October 25, 2013 Amit Kumar amit.kr@nkn.in Table of Content

T he T ransportation Ce nte r Ha ni S. Ma hma ssa ni Pre se nta tio n to the BAC Ma y 3,

Agenda Inaugural Session Registration &amp; Welcome Coffee 7:30 - 8:30 Welcome Address - SCLG

CONNECTING THE WORLD Silicon | Systems | IoT / @MosChipT ech 2 A GIMPSE OF / @MosChipT ech

The Cost of Transmission for Wind Energy: A Review of Transmission Planning Studies Andrew

Adding Disruption Tolerant Networking to UnetStack Arnav Dhamija Acoustic Research Laboratory,

Southwest Power Pool Sub-Regional Planning Meeting Sub-Regional Area 3 David Sargent May 12,

Overview of the Generation- - Overview of the Generation Transmission Maximization

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Agenda Inaugural Session Registration & Welcome Coffee 7:30 - 8:30 Welcome Address - SCLG