exploring use cases for non volatile
play

Exploring Use-cases for Non-Volatile Memories in support of HPC - PowerPoint PPT Presentation

Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience Onkar Patil 1 , Saurabh Hukerikar 2 , Frank Mueller 1 , Christian Engelmann 2 1 Dept. of Computer Science, North Carolina State University 2 Computer Science and Mathematics


  1. Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience Onkar Patil 1 , Saurabh Hukerikar 2 , Frank Mueller 1 , Christian Engelmann 2 1 Dept. of Computer Science, North Carolina State University 2 Computer Science and Mathematics Division, Oak Ridge National Laboratory

  2. MOTIVATION • Exaflop Computers  compute devices + memory devices + interconnects + cooling and power • Close Proximity! • Manufacturing processes not foolproof • Lower durability and reliability • Frequency of device failures and data corruptions ↑  effectiveness and utility ↓ • Future Applications need to be more resilient, • Maintain a balance between performance and power consumption • Minimize trade-offs

  3. PROBLEM STATEMENT • Non-volatile memory (NVM) technologies  maintain state of computation in the primary memory architecture • More potential as specialized hardware • Data Retention  critical in improving resilience of an application against crashes • Persistent memory regions to improve HPC resiliency  key aspect of this project APPLICATION APPLICATION APPLICATION STATIC DATA DYNAMIC DATA STATIC DATA DYNAMIC DATA STATIC DATA DYNAMIC DATA STRUCTURES STRUCTURES STRUCTURES STRUCTURES STRUCTURES STRUCTURES DRAM NVM DRAM NVM DRAM NVM NVM-based Main Memory Application-directed Checkpointing Data Versioning

  4. RESULTS • Experimentation Setup • 16-node cluster with Dual socket, Quad-Core AMD Opteron, 128 GB DRAM memory, Intel SSD from 100GB to 256GB • DGEMM benchmark of the HPCC benchmark suite • Tested for 4, 8 and 16-node configurations for a matrix sizes of 1000, 2000 and 3000 elements

  5. GFLOPS in node scaling for StarDGEMM DRAM 10 DRAM PMEM_ONLY Execution times in node scaling for StarDGEMM 100 PMEM_CPY PMEM_ONLY PMEM_VER PMEM_CPY PMEM_VER 1 10 GFLOPS Time(sec) 0.1 1 0.01 0.1 0.001 0.01 4 Nodes 8 16 4 Nodes 8 16 • DRAM only allocation and NVM-based main memory perform better • An inefficient lookup algorithm

  6. DRAM Execution time for problem size scaling in StarDGEMM DRAM GFLOPS for problem size scaling in StarDGEMM PMEM_ONLY 10000 10 PMEM_CPY PMEM_ONL PMEM_VER Y 1000 1 100 0.1 GFLOPS Time(sec) 10 0.01 1 0.001 0.1 0.0001 0.01 1000 2000 3000 1000 2000 3000 No. of elements No. of elements • All modes perform similar and consistently for node and data scaling • Execution time increases exponentially for multiple copies of memory

  7. CONCLUSION & FUTURE WORK • Conclusion: • Non-volatile memory devices can be used as specialized hardware for improving the resilience of the system • Future Work: • Memory usage modes to make applications efficient and maintain complete system state • Minimal overhead • Support more complex applications • Lightweight recovery mechanisms to work with the checkpointing schemes • Reduce downtime and rollback time • Intelligent policies that can help build resilient static and dynamic runtime system

  8. Evaluating Performance of Burst Buffer Models for Real-Application Workloads in HPC Systems Harsh Khetawat Frank Mueller Christopher Zimmer

  9. Introduction • Existing storage systems becoming bottleneck • Solution: burst buffers • Use burst buffers for: – Checkpoint/Restart I/O – Staging – Write-through cache for Burst Buffers on Cori parallel FS

  10. Placement • Burst buffer placement: – Co-located with compute nodes (Summit) – Co-located with I/O nodes (Cori) – Separate set of nodes • Trade-offs in choice of placements – Capability – I/O models, staging, etc. – Predictability – Impact on shared resources, runtime variability – Economic – Infrastructure reuse, cost of storage device • I/O performance dependent on placement – Choice of network topology

  11. Idea • Simulate network and burst buffer architectures – CODES simulation suite – Real-world I/O traces (Darshan) – Full multi-tenant system with mixed workloads (capability/capacity) – Supports network topologies – Local & external storage models • Combine network topologies and storage architectures • Performance under striping/protection schemes • Reproducible tool for HPC centers

  12. Conclusion • Determine based on workload characteristics: – Burst buffer placement – Network topology – Performance of striping across burst buffers – Overhead of resilience schemes • Reproducible tool to: – Simulate specific workloads – Determine best fit

  13. Thank You

Recommend


More recommend