Modeling Resource-Coupled Computations MarkHereld - PowerPoint PPT Presentation

Modeling Resource-Coupled Computations Mark Hereld  Computa0on Ins0tute  Mathema0cs and Computer Science  Argonne Leadership Compu0ng Facility  Argonne Na0onal Laboratory  University of Chicago 

Roadmap • issues  and  ideas  • models  and  measurements  • implica0ons  and  work in progress  

Issue • Given increasingly massive (and complex) datasets…  • how to connect them to computa0onal and display  resources that support visualiza0on and analysis?    • holis0c approaches to alloca0ng simula0on, analysis,  visualiza0on, display, storage, and network resources  • create and exploit ways to op0mally couple these  resources in real 0me 

Common sense • Analysis engines must be co‐located with simula0on  engines  • …or even, analysis code must be co‐located with  simula0on code, i.e  in situ  • Display resources must be integrated locally with HPC  resources  • In general, wide‐area applica0ons will become  impossible…  • But, maybe the situa0on isn’t so dire. 

ideas • Ideas  • Models  • Measurements  ideas  • Consequences  • Future  

Mitigation • More efficient I/O prac0ces  – Many (most) inefficiencies in R/W rates amenable to beWer  prac0ces by applica0on developer  – In addi0on to improvements in performance of I/O libraries  • BeWer data management  – BeWer data layout  • BeWer brute force compression methods  – Uncertainty aware; domain aware  • Leveraging limita0ons at the des0na0on  – Pixel real estate  – Perceptual limita0ons (and features) 

Coupled Resources • remote visualiza0on : couple data and large  computa0onal resources to remote display hardware  • in situ  analysis and visualiza0on: merge simula0on  and analysis code on single machine  • co‐analysis : couple simula0on on supercomputer to  live analysis on visualiza0on and analysis plaZorm 

models • Ideas  • Models  • Measurements  • Consequences  models  • Future  

ALCF Network Architecture  40K BGP Compute Nodes  Eureka  100 Nodes  640 BGP I/O Nodes  10G MX  Tree  10GE  4.3Tbps  100 x 10G  Myrinet Switch   =  1 Tbps   Complex  5‐Stage CLOS   10G MX   10GE<‐>MX conversion  MX<‐>MX  640 x 10G  128 x 10G   =  6.4 Tbps   =  1.28 Tbps  Tbps – Terabits/sec  128 File‐Server   Nodes  Theore0cal Max Bandwidth from  I/O Nodes to Eureka  (Memory to Memory)    =  1 Tbps                        Bi‐direc0onal    =  2 Tbps  Theore0cal Max Bandwidth from  I/O Nodes to FileServer  (Memory to Memory) =  1.28 Tbps                       Bi‐direc0onal     =  2.56 Tbps  Theore0cal Max Bandwidth from  Eureka to FileServer  (Memory to Memory)     =  1 Tbps                        Bi‐direc0onal     = 2 Tbps 

Data Analytics Resource: Eureka • Data analy0cs and visualiza0on cluster at ALCF  • (2) head nodes, (100) compute nodes  – (2) Nvidia Quadro FX5600 graphics cards  – (2) XEON E5405 2.00 GHz quad core processors  – 32 GB RAM: (8) 4 rank, 4GB DIMMS  – (1) Myricom 10G CX4 NIC  – (2) 250GB local disks; (1) system, (1) minimal scratch  – 32 GFlops per server 

Application • FLASH  – Mul0‐physics code: Gravita0on, nuclear chemistry, MHD  – Laboratory to Universe  • Mul0ple (~20) simula0ons  – 8km resolu0on, 10K to 100K blocks each (16 * 16 * 16) voxel  – 2 Racks (8K cores) of the ANL’s Intrepid (BGP)  – typical simula0on is 10 runs each 12 hours  • O(hour) per checkpoint cycle  – 66% 0me spent simula0ng  – 33% 0me spent non‐overlapping I/O 

measurements • Ideas  • Models  • Measurements  • Consequences  measurements  • Future  

Flash IO for 1 run (12 hours) • Total Run 0me  =  41557 secs  – IO 0me during run = 14325 sec  (34% of the 0me)  – Circa March 2009  • Par0cle Data:  – 417 Files (0.1GB each) =   41.7 GB   – Time spent wri0ng =   9047 secs  ( 22% of the run 0me)  • Plot files:  – 104 files (2.5GB each) ;Total =  260GB    – Time spent in wri0ng =   3897 secs  ( 9% of the run 0me)  • Checkpoint  files:  – 10 files (8 GB each) ;Total =  80GB    – Time spent in wri0ng =  1144 secs  ( 3% of the run 0me) 

FLASH Supernova Explosion Project • mul0ple (~20) simula0ons  – 8km resolu0on  – 10K to 100K blocks each (16 * 16 * 16) voxel   – 2 Racks (8K cores) of the ANL’s Intrepid (BGP)   – typical simula0on is 10 runs each 12 hours  – Circa November 2009   • ======================================================= • File Type File Size #files #files Data Size • / Run / Sim • ======================================================= • Particle ~ 131 MB ~ 500 5000 500 GB • Plot ~ 13 GB 40-90 800 10 TB • Checkpoint ~ 42 GB 5-10 100 4.2 TB • =======================================================

Internal Network Experiments BGP I/O Node  Tree Network  Switch  Analysis Node  BGP Compute Nodes 

Toward middleware to facilitate co-analysis BGP Compute Nodes 

consequences • Ideas  • Models  • Measurements  • Consequences  • Future   consequences 

Map Intrepid I/O to Eureka • Speed up the applica0on  – Offload data organiza0on and disk writes  • Free co‐analysis  – Produce several high resolu0on movies  – Data compression  – Mul0‐0me step caching for window analysis  • Eureka is an accelerator and co‐analysis engine at only  1‐2% cost of Intrepid 

future • Ideas  • Models  • Measurements  • Consequences  future  • Future  

Works in Progress • Footprints  – System level use paWern data collec0on  – Boo0ng up a mini‐consor0um of resource monitoring enthusiasts  • in situ  – Papka parallel sorware rendering  – Tom Peterka and Rob Ross scaling sorware rendering algorithms  – HW‐SW rendering comparison experiments  • Co‐analysis  – StarGate experiments  – Intrepid <> Eureka communica0on experiments  – FLASH test  • Remote Visualiza0on  – Pixel shipping experiments and frameworks 

Eureka Rendering Times Surveyor Rendering Times 256x256x256 256x256x256 1 0.1 Full Frame Time 100 Time (secs) Time (secs) Full Frame Time Render Time 10 0.01 Render Time Composite Network Time 1 Composite Network Time 0.001 Composite Render Time 0.1 Sync State Time Composite Render Time 0.01 0.0001 Sync State Time 0.001 0.00001 0.0001 1 10 100 1000 1 10 100 1000 Num Procs Num Procs Eureka Rendering Times 512x512x512 Surveyor Rendering Times 512x512x512 1 0.1 100 Times (secs) Full Frame Time Full Frame Time 10 Time (secs) 0.01 Render Time Render Time 1 Composite Network Time Composite Network Time Composite Render Time 0.001 0.1 Sync State Time Composite Render Time 0.01 Sync State Time 0.0001 0.001 0.0001 0.00001 1 10 100 1000 1 10 100 1000 Num Procs Num Procs Eureka Rendering Times 1024x1024x1024 Eureka Rendering Times 2048x2048x2048 10 1 1 0.1 Time (secs) Full Frame Time Time (secs) 0.1 Full Frame Time Render Time Render Time 0.01 0.01 Composite Network Time Composite Network Time Composite Render Time Composite Render Time 0.001 Sync State Time Sync State Time 0.001 0.0001 0.00001 0.0001 1 10 100 1 10 100 1000 Num Procs Num Procs

Wide Area Experiments Simula0on  Visualiza0on  Interac0ve Display  RESULTS  • 4K uniform grid cube  RAW  • Single variable, float  • Large 0led display  • 257 GB per 0me step  • Volume rendering  • Naviga0on  DATA  • 577 0me steps  • 4K x 4K pixel  • Manipula0on  CONTROL  • 150 TB total  DETAILS AND DEMO IN SDSU BOOTH 

Modeling Resource-Coupled Computations MarkHereld - PowerPoint PPT Presentation

Modeling Resource-Coupled Computations MarkHereld Computa0onIns0tute Mathema0csandComputerScience ArgonneLeadershipCompu0ngFacility ArgonneNa0onalLaboratory UniversityofChicago Roadmap

Embarrassingly Parallel Computations 3.2 1 Embarrassingly Parallel Computations A computation

Structuring Computations Structuring Computations Contents Jacobs Types06, 18/4/06

Distributed Resource Allocation for Grid Computations Peter Gradwell and Julian Padget

The computations of acting agents and the agents acting in computations Philipp Hennig ICERM 5

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing &

Embarrassingly Parallel Computations Embarrassingly Parallel Computations A computation that

for Optimization and Analysis of Floating-Point Computations Heiko Becker, Pavel Panchekha, Eva

Interval Computations as Why Intervals? Applied Constructive Interval Computations . . . Wiener

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Coupled cluster computations with two-body currents Gaute Hagen Oak Ridge National Laboratory

Inductively coupled plasma mass spectrometry (ICPMS) What is ICP MS Inductively coupled plasma

DC DC-COUPL COUPLED ED SOLAR PLUS STORAGE DC-COUPLED SOLAR PLUS STORAGE DC Coupling enables

The coupled vibration analysis The coupled vibration analysis for for vertical pumps vertical

Strongly Coupled Gauge Strongly Coupled Gauge Theories and Strings Theories and Strings Igor

Specific context: Climate reanalysis The ERA-CLIM and ERA-CLIM2 projects CERA: a system for

Formal verification of a compiler front-end for mini-ML Zaynah Dargaye, Xavier Leroy, Andrew

Data Centers & Co-designed Distributed Systems A Data Center Inside a Data Center Data

CEKgo extensions M ::= . . . | go M | here M F ::= ( W ) | ( M E ) |

A model for the extended predicative Mahlo Universe Anton Setzer (joint work with Reinhard Kahle,

MATH 12002 - CALCULUS I 1.4: Limit Laws Professor Donald L. White Department of Mathematical

Average - case Lower Bounds for Approximate Near - Neighbor fs om Isoperimetric Inequalities

Today Closed World Assumption & Negation as Failure. Clark completion Lloyd-Topor

Logic Programming Theory Lecture 7: The Closed World Assumption Alex Simpson School of

Modeling Resource-Coupled Computations MarkHereld - PowerPoint PPT Presentation

Modeling Resource-Coupled Computations MarkHereld Computa0onIns0tute Mathema0csandComputerScience ArgonneLeadershipCompu0ngFacility ArgonneNa0onalLaboratory UniversityofChicago Roadmap

Embarrassingly Parallel Computations 3.2 1 Embarrassingly Parallel Computations A computation

Structuring Computations Structuring Computations Contents Jacobs Types06, 18/4/06

Distributed Resource Allocation for Grid Computations Peter Gradwell and Julian Padget

The computations of acting agents and the agents acting in computations Philipp Hennig ICERM 5

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing &amp;

Embarrassingly Parallel Computations Embarrassingly Parallel Computations A computation that

for Optimization and Analysis of Floating-Point Computations Heiko Becker, Pavel Panchekha, Eva

Interval Computations as Why Intervals? Applied Constructive Interval Computations . . . Wiener

Modeling of proteins and complexes High resolution Low resolution Modeling of domains Modeling

Virtual Reality Modeling Virtual Reality Modeling from http://www.okino.com/ Modeling Modeling

Coupled cluster computations with two-body currents Gaute Hagen Oak Ridge National Laboratory

Inductively coupled plasma mass spectrometry (ICPMS) What is ICP MS Inductively coupled plasma

DC DC-COUPL COUPLED ED SOLAR PLUS STORAGE DC-COUPLED SOLAR PLUS STORAGE DC Coupling enables

The coupled vibration analysis The coupled vibration analysis for for vertical pumps vertical

Strongly Coupled Gauge Strongly Coupled Gauge Theories and Strings Theories and Strings Igor

Specific context: Climate reanalysis The ERA-CLIM and ERA-CLIM2 projects CERA: a system for

Formal verification of a compiler front-end for mini-ML Zaynah Dargaye, Xavier Leroy, Andrew

Data Centers &amp; Co-designed Distributed Systems A Data Center Inside a Data Center Data

CEKgo extensions M ::= . . . | go M | here M F ::= ( W ) | ( M E ) |

A model for the extended predicative Mahlo Universe Anton Setzer (joint work with Reinhard Kahle,

MATH 12002 - CALCULUS I 1.4: Limit Laws Professor Donald L. White Department of Mathematical

Average - case Lower Bounds for Approximate Near - Neighbor fs om Isoperimetric Inequalities

Today Closed World Assumption &amp; Negation as Failure. Clark completion Lloyd-Topor

Logic Programming Theory Lecture 7: The Closed World Assumption Alex Simpson School of

Sparse Computations and Multi-BSP Albert-Jan Yzelman October 11, 2016 Parallel Computing &

Data Centers & Co-designed Distributed Systems A Data Center Inside a Data Center Data

Today Closed World Assumption & Negation as Failure. Clark completion Lloyd-Topor