nccs data analytics and storage system dass may 4 2016
play

NCCS Data Analytics and Storage System (DASS) May 4, 2016 High - PowerPoint PPT Presentation

National Aeronautics and Space Administration NCCS Data Analytics and Storage System (DASS) May 4, 2016 High Performance Science www.nasa.gov DASS Concept Read access from all nodes within the ADAPT system ADAPT Climate Analytics as


  1. National Aeronautics and Space Administration NCCS Data Analytics and Storage System (DASS) May 4, 2016 High Performance Science www.nasa.gov

  2. DASS Concept Read access from all nodes within the ADAPT system ADAPT Climate Analytics as • Serve to data portal services a Service • Serve data to virtual machines for additional processing Analytics through web services or higher • Mixing model and observations level APIs are executed and passed down into the centralized storage environment for processing; answers are returned. Only HyperWall those analytics that we have written are exposed. Read access from the HyperWall to facilitate visualizing model outputs quickly after they have been created. Data Analytics Mass Storage HPC - Discover and Storage System (DASS) Read and write access from the mass Write and Read from all nodes within Discover – models write storage ~10 PB data into GPFS which is then staged into the centralized • Stage data into and out of the storage (burst buffer like). Initial data sets could include: centralized storage environment as •Nature Run needed •Downscaling Results •Reanalysis (MERRA, MERRA2) •High Resolution Reanalysis Note that more than likely all the services will still have local file systems to enable local writes within their respective security domain. 2

  3. What are we doing to get there? • The NCCS is interest in POSIX compliant Object Storage so the following options are being evaluated • HDFS to establish a baseline • Cloudera with the GPFS HDFS Transparency connector • Lustre with the Hadoop Adapter for MapReduce/Yarn (HAM) and Hadoop Adapter for Lustre (HAL) 3

  4. DASS Software Stack MPI, Open, MapReduce, Read, Write, Traditional HPC Big Data Analytics Spark, ML etc. Classical Usage Patterns Hadoop-Like Usage Cloudera, Network, Analytics moved to the data Data is moved to the process Horton, BDAS IB, RDMA Hadoop GPFS RESTful Interface POSIX Interface Connector Object Store/Posix Parallel File System IBM IBM Very large, scaling both horizontally (throughput) and Spectrum Spectrum vertically (capacity); permeated with compute capability at Scale (GPFS) Scale (GPFS) all levels Server & JBOD Traditional HPC Storage Commodity-Based Hardware 4

Recommend


More recommend