Transferring a Petabyte in a Day Raj Kettimuthu, Zhengchun Liu, - PowerPoint PPT Presentation

Transferring a Petabyte in a Day Raj Kettimuthu, Zhengchun Liu, David Wheeler, Ian Foster, Katrin Heitmann, Franck Cappello

Huge amount of data from extreme scale simulations and experiments

Systems have different capabilities

SC16 demonstration Second level Data Analytics NERSC First level Cosmology Data Analytics Simulation + Visualization Data Data Pulling Vis. Streaming (MIRA) (Blue Waters) Analytics NCSA Data Pulling ANL-NCSA (100Gb/s) 100Gb/s ANL NCSA Booth ORNL Transfer snapshots 2nd level data (SC16) Vis. Streaming 1 PB/day (once) + Visualization streaming 29 Billion particles >1PB of Storage (DDN) (transmit all Archive + 2nd level Visualization snapshots) Display (NCSA, EVL)

Objectives § Running a state-of-the-art cosmology simulation and analyzing all snapshots – Currently only one in every five or 10 snapshots is stored or communicated § Combining two different types of systems (simulation on Mira and data analytics on Blue Waters) – Geographically distributed, different administrative domains – Run an extreme-scale simulation and analyze the output in a pipelined fashion § Many previous studies have varied transfer parameters such as concurrency and parallelism to improve data transfer performance – We also demonstrate the value of varying the file size, which provides additional flexibility for optimization § We demonstrate these methods in the context of dedicated data transfer nodes and a 100 Gb/s network

Science case K. Heitmann et al. ROSAT (X-ray) WMAP (microwave) Fermi (gamma ray) SDSS (optical)

Demo environment § Source of the data was the GPFS parallel file system on the Mira supercomputer at Argonne § Destination was the Lustre parallel file system on the Blue Waters supercomputer at NCSA § Argonne has 12 data transfer nodes (DTNs) dedicated for wide-area data transfer § NCSA has 28 DTNs § Each DTN runs a GridFTP server § Globus to orchestrate our data transfers – Automatic fault recovery and load balancing among the available GridFTP servers on both ends.

GridFTP concurrency and parallelism

GridFTP pipelining Traditional Pipeline

Impact of tuning parameters

Transfer performance

Checksum verification § 16-bit TCP checksum inadequate for detecting data corruption and corruption can occur during file system operations § Globus pipelines the transfer and checksum computation – Checksum computation of the ith file happens in parallel with the transfer of the (i + 1)th file T b trs T trs T trs T trs T trs Transfer pipeline T ck T ck T ck T ck Verification pipeline

Checksum overhead

Impact of checksum failures

A model to find optimal number of files § A simple linear model of transfer time for a single file: T trs = a trs x + b trs ; a trs – unit transfer time, x – file size, b trs - startup cost § T ck = a ck x + b ck ; a ck – unit checksum time, b ck – checksum startup cost § Assuming that unit checksum time is less than unit transfer time, the total time T to transfer n files with one GridFTP process T = nT trs + T ck + b trs = n(a trs x + b trs ) + a ck x + b ck + b trs § S – Total bytes, N – Total files, cc – concurrency; x = S/N, n = N/cc § The transfer time T to transfer all N files T (N) = S/cc * a trs x + N/cc * b trs + S/N * a ck x + b ck + b trs

Evaluation of the model

Conclusion § Our experiences in our attempts to transfer one petabyte of science data within one day § Exploration to identify parameter values that yield maximum performance for Globus transfers § Experiences in transferring data while the data are produced by the simulation – Both with and without end-to-end integrity verification § Achieved 99.8% of our one petabyte-per-day goal without integrity verification and 78% with integrity verification § Finally, we used a model-based approach to identify the optimal file size for transfers – Achieve 97% of our goal with integrity verification by choosing the appropriate file size § A useful lesson in the time-constrained transfer of large datasets.

Questions

Transferring a Petabyte in a Day Raj Kettimuthu, Zhengchun Liu, - PowerPoint PPT Presentation

Transferring a Petabyte in a Day Raj Kettimuthu, Zhengchun Liu, David Wheeler, Ian Foster, Katrin Heitmann, Franck Cappello Huge amount of data from extreme scale simulations and experiments Systems have different capabilities SC16

Netflix: Netflix: Petabyte Scale Petabyte Scale Analytics Infrastructure in Analytics

At Creation Common Holy Day 1 Day 2 Day 8 Day 9 Day 3 Day 4 Day 5 Day 6 Day 7 7 Days The

Tax Issues in Transferring LLC and Tax Issues in Transferring LLC and Partnership Interests

Science with a Little Altitude | QS18 Fah Sathirapongsasuti, PhD EBC Everest Day 1 Day 2 Day

THE UNINTENDED CONSEQUENCES THE UNINTENDED CONSEQUENCES OF TRANSFERRING REAL ESTATE OF

A guide to transferring to a Havering secondary school in September 2017 For children born

MODERATOR: John F. DeLillo, C.P.A. VARIOUS WAYS WE PASS THE TORCH Sell/Merge Transferring

19 th November 2014 The Legacy Series The Family Business: Preserving & transferring

AOS Linux Tutorial Remote Access and Transferring Files Michael Havas Dept. of Atmospheric and

Searching and Navigating Petabyte-Scale Files Systems Based on Facets Jonathan Koren, Yi Zhang,

How to make a petabyte ROOT file: proposal for managing data with columnar granularity Jim

Rapid Replication of Multi- Petabyte File Systems Justin Sybrandt Jason Hick (NSF award number

Altinity Building Multi-Petabyte Data Warehouses with ClickHouse Alexander Zaitsev LifeSteet,

Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO Before we

How to Build a Petabyte Sized Storage System Invited Talk for LISA09 Ray Paden Version 2.0

Netflix: Integrating Spark At Petabyte Scale Ashwin Shankar Cheolsoo Park Outline 1. Netflix

Shoreline Master Program Periodic Update Community Meeting December 19, 2019 Agenda A.

A R L I N G T O N V I E W T E R R A C E ADVISORY WORKING GROUP MEETING 12/11/19 PERSPECTIVE AT

HSBC Holdings plc Annual Results 2014 Presentation to Investors and Analysts PUBLIC Important

Corporate Presentation February 2017 ShaMaran A Lundin Group Company Strategy Focus on

Josephine County, Oregon Audit Results COMMUNICATION WITH THOSE CHARGED WITH GOVERNANCE January

Hong Kong SSIL 2019 Interim Results Announcement 21 Aug 2019 P. 1 Disclaimer This presentation

Homeland Security Chemical Filter Technology NAFA 2005 Technical Seminar Dr. David Friday

FY20 Q2 Results Presentation May 2020 Section 1 Business Updates Strong company response and

Transferring a Petabyte in a Day Raj Kettimuthu, Zhengchun Liu, - PowerPoint PPT Presentation

Transferring a Petabyte in a Day Raj Kettimuthu, Zhengchun Liu, David Wheeler, Ian Foster, Katrin Heitmann, Franck Cappello Huge amount of data from extreme scale simulations and experiments Systems have different capabilities SC16

Netflix: Netflix: Petabyte Scale Petabyte Scale Analytics Infrastructure in Analytics

At Creation Common Holy Day 1 Day 2 Day 8 Day 9 Day 3 Day 4 Day 5 Day 6 Day 7 7 Days The

Tax Issues in Transferring LLC and Tax Issues in Transferring LLC and Partnership Interests

Science with a Little Altitude | QS18 Fah Sathirapongsasuti, PhD EBC Everest Day 1 Day 2 Day

THE UNINTENDED CONSEQUENCES THE UNINTENDED CONSEQUENCES OF TRANSFERRING REAL ESTATE OF

A guide to transferring to a Havering secondary school in September 2017 For children born

MODERATOR: John F. DeLillo, C.P.A. VARIOUS WAYS WE PASS THE TORCH Sell/Merge Transferring

19 th November 2014 The Legacy Series The Family Business: Preserving &amp; transferring

AOS Linux Tutorial Remote Access and Transferring Files Michael Havas Dept. of Atmospheric and

Searching and Navigating Petabyte-Scale Files Systems Based on Facets Jonathan Koren, Yi Zhang,

How to make a petabyte ROOT file: proposal for managing data with columnar granularity Jim

Rapid Replication of Multi- Petabyte File Systems Justin Sybrandt Jason Hick (NSF award number

Altinity Building Multi-Petabyte Data Warehouses with ClickHouse Alexander Zaitsev LifeSteet,

Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP &amp; CTO Before we

How to Build a Petabyte Sized Storage System Invited Talk for LISA09 Ray Paden Version 2.0

Netflix: Integrating Spark At Petabyte Scale Ashwin Shankar Cheolsoo Park Outline 1. Netflix

Shoreline Master Program Periodic Update Community Meeting December 19, 2019 Agenda A.

A R L I N G T O N V I E W T E R R A C E ADVISORY WORKING GROUP MEETING 12/11/19 PERSPECTIVE AT

HSBC Holdings plc Annual Results 2014 Presentation to Investors and Analysts PUBLIC Important

Corporate Presentation February 2017 ShaMaran A Lundin Group Company Strategy Focus on

Josephine County, Oregon Audit Results COMMUNICATION WITH THOSE CHARGED WITH GOVERNANCE January

Hong Kong SSIL 2019 Interim Results Announcement 21 Aug 2019 P. 1 Disclaimer This presentation

Homeland Security Chemical Filter Technology NAFA 2005 Technical Seminar Dr. David Friday

FY20 Q2 Results Presentation May 2020 Section 1 Business Updates Strong company response and

19 th November 2014 The Legacy Series The Family Business: Preserving & transferring

Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO Before we