Large-Scale Adaptive Mesh Simulations Through Non-Volatile - PowerPoint PPT Presentation

Large-Scale Adaptive Mesh Simulations Through Non-Volatile Byte-Addressable Memory Bao Nguyen Hua Tan Xuechen Zhang Kei Davis* *

Octree Meshing is Widely Used in HPC Simulation Droplet breakup Micro-boiling Droplet ejection 2

Quad/Octree-Based Adaptive Meshing R 10 9 6 7 8 1 6 4 5 1 3 2 9 10 2 7 8 4 5 3 Domain decomposition Quad/octree representation in DRAM Because models span larger length and time-scales, DRAM demand is significant even on supercomputers. 3

Per-core DRAM Capacity is Shrinking on Supercomputers Jaguar: 2.7-4 GB/core Titan: 2 GB/core Due to associated capital costs and power consumptions 4

Using Non-Volatile Byte-addressable Memory for Meshing Non-Volatility Byte-Addressability Speed Cost Power Flash Low Decreasing Yes No Low DRAM High Increasing No Yes High NVBM High* Decreasing Yes Yes Low Non-Volatility Byte-Addressability Speed Cost Power 5

Existing Applications were Not Designed for NVBM Linear octree[SC’ 07 ], parallel octree[SC’05], etc. In-core But they save snapshots on storage systems for failure Algorithms recovery; I/Os can be the bottleneck. Etree[SC’04], visualization[TVCG’97], etc. Out-of-core But they were designed for slow non-volatile Algorithms mediums, e.g., SSDs and HDDs. Can we support in-NVBM octree meshing bypassing slow I/O buses? 6

Challenge I: NVBM Writes Incur Higher Latency DRAM NVBM NVBM write latency is 2.5X greater than DRAM. Meshing operations (e.g., refinement) are write-intensive. 7

Challenge II: Existing Octrees Are Not Durable for NVBM After normal pointer writing After failed pointer writing 7 8 10 7 8 10 9 9 11 X A failure may cause the pointer to link to an undefined region in NVBM. 8

Challenge III: Difficult to Handle Special Pointers R Special pointers 1 6 7 8 10 2 4 5 9 3 NVBM DRAM Handling special pointers introduces extra complexity for application developers. . 9

Design Objectives of Persistent-Merged Octree + + In-NVBM meshing Hiding write Orthogonal & storage latency to NVBM persistence Persistent-merged octree (PM-octree) 10

PM-Octree Design: A Multi-Version Data Structure V i-1 V i Persistent Volatile NVBM DRAM +NVBM The persistent version provides the desired durability. 11

PM-Octree Design: Octant Sharing between Versions V i-1 V i C 1 tree NVBM Observation: many spatial Reduce the memory usage domains do not change in by up to 1.9X. adjacent time steps. . 12

PM-Octree Design: Partitioned Data Structure V i R V D i 1 6 3 5 8 9 2 4 7 10 C 0 tree in DRAM C 1 tree in NVBM Effectively use both DRAM and NVBM. 13

PM-Octree Design: Dynamic Layout Transformation V i R R V iD V iD 1 6 1 6 7 8 9 10 2 3 4 5 2 3 4 5 7 8 9 10 NVBM DRAM DRAM NVBM Layout transformation is periodically executed to hide NVBM write latency. 14

Putting Together the Components of PM-Octree V i-1 V i V i D C 1 tree C 1 tree C 0 tree DRAM NVBM A multi-version data structure for both in-memory meshing and storage. It provides near-instantaneous failure recovery by accessing memory bus. 15

Basic Operation: Octant Insertion Before inserting octant 11 After inserting octant 11 V i-1 V i-1 V i R R R ’ 11 1 1 6 6 u u u’ 2 3 4 5 7 8 9 10 2 3 4 5 7 8 9 10 9’ 11 16

Basic Operation: Octant Update Before updating octant 10 After updating octant 10 V i-1 V i V i V i-1 R R ’ R ’ R 1 6 u 1 6 u ’ u u ’ 2 3 4 5 7 8 9 10 9’ 2 3 4 5 7 8 9 10 9’ 10’ 11 11 17

PM-Octree Design: Orthogonal Persistence Routine Description create a new PM-octree; pmoctree ⋆ pm_create(octree ⋆ tree) return a pointer to V i create a persistent version of void pm_persistent(pmoctree ⋆ tree) octree restore a PM-octree; pmoctree ⋆ pm_restore(void) return a pointer to V i delete all octants on NVBM and void pm_delete(pmoctree ⋆ tree) DRAM We integrated it with Gerris flow solver. 18

Experimental Setting • Hardware Ø Titan at ORNL Ø Emulation of NVBM using DRAM Routine DRAM NVBM Read Latency (ns) 60 100 Write Latency (ns) 60 150 • Simulation • Droplet rotation and ejection 19

Comparison of Meshing Methods Objects Objects Method name Interface in DRAM in NVBM In-core-octree Octants Snapshot File System Octant Out-of-core-octree Cache File System record PM-octree Octants Octants Memory 20

Weak Scaling • 1.2M to 1077M elements • 1 to 1000 PEs • Number of element on each PE: ~1 million The execution time of PM-octree increases as a logarithm of problem size. 21

Execution Time Breakdown with Weak Scaling Tree partitioning overhead prevents from achieving an optimal speedup. 22

Strong Scaling • Problem size is 150 million elements • 240 to 1000 PEs Scalability of PM-octree is similar as in-core-octree. 23

Execution Time Breakdown with Strong Scaling No scalability issue because no major fluctuation is observed 24

Failure Recovery PM-octree reduces the failure recovery time by up to 20X. PM-octree guarantees data consistency after failures. 25

Conclusions • PM-octree effectively extends memory capacity using NVBM. • It scales as well as in-core algorithms. • It significantly reduces the time of recovery. • It provides easy-to-program interface. 26

Acknowledgments Xuechen Zhang xuechen.zhang@wsu.edu Bao Nguyen Hua Tan 27

Basic Operation: Octant Merging Before merging C 0 After merging C 0 V i-1 V i C 1 C 1 C 0 C 0 NVBM subtree DRAM subtree 28

Basic Operations: Persistent Before persistent After persistent V i+1 V i-1 V i V i R R R ’ R ’ 29

Layout Dynamic Transformation Execution time is reduced by 25% while the number of writes is reduced by up to 30%. 30

Impact of DRAM Size Varied memory sizes influence the merging frequency and execution time. 31

Large-Scale Adaptive Mesh Simulations Through Non-Volatile - PowerPoint PPT Presentation

Large-Scale Adaptive Mesh Simulations Through Non-Volatile Byte-Addressable Memory Bao Nguyen Hua Tan Xuechen Zhang Kei Davis* * Octree Meshing is Widely Used in HPC Simulation Droplet breakup Micro-boiling Droplet ejection 2

Thomas Hhn 4. Juni 2009 TU-Berlin, Berlin Why to How to Worksheets mesh ? mesh ? Outline

Mesh Basics Mesh Basics 1 Spring 2010 Definitions: Definitions: 1/2 Definitions:

Non-Mesh Treatment of SUI Shachar Aharony MD AUA SUI Guidelines 2017 Shachar Aharony MD,

Adaptive Mesh Refinement CS 101 - Meshing Winter 2007 1 Mesh Refinement Applications

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

COMPUTER COMPUTER COMPUTER COMPUTER SIMULATIONS SIMULATIONS SIMULATIONS SIMULATIONS

Mesh Networks | Hacking The T3lc0 Model http://arig.org.il What's a Mesh Anyway ? Mesh =

A Service Mesh Is Easy To Swallow In Small Pieces Andrew Jenkins Eng Lead, Aspen Mesh

W ir eless Mesh Netw or k W ir eless Mesh Netw or k Technical Overview Technical Overview Danny

What Makes for a Good Mesh? CS101 Meshing Winter 2007 1 Mesh Quality What makes a mesh

Smoothing Gianpaolo Palma Triangle Mesh List of vertices + List of triangle as triple of vertex

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Chapter 16 Distributed-File Systems Background Naming and Transparency Remote File

The Fork-Join Model and its Implementation in Cilk Marc Moreno Maza University of Western

Memory Questions? ! What is main memory? CSCI [4|6]730 ! How does multiple processes share memory

Leveraging MPST in Linux with Application Guidance to Achieve Power and Performance Goals Michael

Over the Edge: Silently Owning Windows 10's Secure Browser Erik Bosman , Kaveh Razavi, Herbert

CS 423 Operating System Design: Memory Wrap-Up Professor Adam Bates CS 423: Operating

Kernel level memory management 1. The very base on boot vs memory management 2. Memory Nodes

Complexity Measures for Parallel Computation Complexity Measures for Parallel Computation