AbstractStorage Movingfileformatspecificabstrac7onsinto - PowerPoint PPT Presentation

Abstract Storage  Moving file format‐specific abstrac7ons into  petabyte‐scale storage systems  Joe Buck, Noah Watkins,   Carlos Maltzahn & ScoD Brandt  

Introduc7on  • Current HPC environment separates  computa7on from storage  – Tradi7onal focus on computa7on, not I/O  – Applica7ons require I/O architecture  independence  • Many scien7fic applica7ons are data intensive  • Performance increasingly limited by data‐ movement 

HPC Architecture  Diagram courtesy of Rob Ross, Argonne Na7onal Laboratory 

HPC Architecture  HW boDleneck  Current  boDleneck in the  controllers  Diagram courtesy of Rob Ross, Argonne Na7onal Laboratory 

HPC Architecture  HW boDleneck  Future boDleneck:   I/O nodes / storage nodes  network  Diagram courtesy of Rob Ross, Argonne Na7onal Laboratory 

Approach:   Move func7ons closer to data  • Use spare CPU cycles at intelligent storage  nodes  – Replace communica7on with CPU cycles  • Provide storage interfaces with higher  abstrac7ons  • Enable file system op7miza7ons due to  knowledge of data structure  • Do this for small selec7on of data structures  – This is  not  another object‐oriented database!  

Why Now?  • Parallel file systems move more intelligence into  storage nodes anyways  • Advances in performance management and  virtualiza7on  • Moving bytes slated to be a dominant cost in exa‐scale  systems  • Scien7fic file formats and operators increasingly  standard   – NetCDF, HDF  • Structured abstrac7ons have seen recent success  – BigTable, MapReduce  – CouchDB 

Abstract Storage   Storage as an Abstract Data Type  • ADT decouples interface from implementa7on  • Only few ADTs necessary, e.g.:  – Dic7onary (Key/value pairs)  – Hypercube (Coordinate Systems)  – Queue  • Op7mize each one for each parallel architecture  – Data placement  – Performance management   – Buffer cache management (incl. pre‐fetching)  – Coherence 

ADTs and Scien7fic Data  • Scien7fic data is normally mul7‐dimensional,  lending itself well to this approach  – Mul7‐dimensional and hierarchical structures are  readily mapped onto data types  • Mul7ple structures mapped onto (por7ons) of  the same data for more efficient access  – Operate on the appropriate structure (matrix, row,  element, etc) 

Implementa7on Challenges  • Programming model  for implemen7ng ADTs  • Everything based on  byte streams  – Current storage APIs (e.g. POSIX)  – Current file system subsystems  • Buffer cache  • Striping strategies  • Storage node interfaces  • Need awareness of  structured data  – New interfaces at various storage layers 

Prototype: Ceph Doodle  • Focus: Programming model for implemen7ng  ADTs  • Construc7on and test framework for:  – Storage abstrac7ons   – ADT implementa7ons  – Programming models (flexibility, ease‐of‐use)  • Based on object‐based parallel file system  architecture (e.g. Ceph). 

Ceph Doodle Features  • Rapid prototyping:  – Uses RPC mechanism  – WriDen in Python  • Support for plugins for different ADTs  – Byte stream (implemented as storage objects)  – Dic7onary (implemented as  skip lists ) 

Ceph Doodle Overview  Clients use applica7on‐specific interfaces  Client Applica7on  ADT‐Opera7on(…)  Data Type  Data types are cross‐cufng system modules  ADT‐Opera7on(…)  Striping  RPC_X(Op, ObjID, Context)  &  RPC_Y(Op, ObjID, Context)  RPC_Z(Op, ObjID, Context)  Caching  …  Striping and caching are op7mized per data  Strategy  type  RPC to OSD  Client  With Object  OSD  RPC ADT Opera7on(Object, Context)  Mappings route ADT RPCs to storage nodes 

Dic7onary Implementa7on: Skip lists  4 3 2 1 0 .head 9 23 1024 1025 .tail

Splifng skip lists across nodes  4 3 2 1 0 .head 9 23 1024 1025 .tail

Future Work  • Building on top of Ceph  – New dynamically loadable object libraries  • Redesigning caching  – Data structure boundary aware v.s. pages  – Pre‐fetching = access paDerns = ADT parameters  • Rethinking striping strategies  • Unified views supported by virtual ADT layer  • Embedding versioning and provenance capturing  into file system 

Thank you  buck@cs.ucsc.edu 

AbstractStorage Movingfileformatspecificabstrac7onsinto - PowerPoint PPT Presentation

AbstractStorage Movingfileformatspecificabstrac7onsinto petabytescalestoragesystems JoeBuck,NoahWatkins, CarlosMaltzahn&ScoDBrandt Introduc7on

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda

Solar Plus Storage Solar Plus Storage Focus on Storage Benefits Focus on Storage Benefits by

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage

Syntax Liam OConnor CSE, UNSW (and data61) Term3 2019 1 Abstract Syntax Parsing Bindings

Introduction to Abstract Data Types Introduction to Abstract Data Types Abstract Data Type (ADT)

Abstract Classes and Interfaces (?) June 21, 2017 Reading Quiz Abstract Classes A. Abstract

CS 2334: Lab 6 Abstract Classes & Interfaces Andrew H. Fagg: CS2334: Lab 6 1 Abstract Class

Cloud Storage Nabil Abdennadher nabil.abdennadher@hesge.ch 1 Cloud storage Objective

A Simulation-based Evaluation of a Hybrid Storage System combining P2P, F2F, and Cloud storage

Central Valley Gas Storage, LLC November 3, 2016 Gill Ranch Storage, LLC Lodi Gas Storage, LLC

AC Transit Bus Storage Facility July 9, 2015 TJPA Board Meeting TJPA Board Meeting Bus Storage

Introd u cing SUSE Enterprise Storage 5 1 SUSE Enterprise Storage 5 SUSE Enterprise Storage 5 is

Storage 2015 Storage Shifts and Software Defined Storage (SDS) MRMUG Chris Walker Solution

SUSE Enterprise Storage 142 142 SUSE Enterprise Storage An intelligent software-defined storage

Nexenta, OpenStorage and Commercial Open Source Anil Gulecha Developer / Community Lead,

Data Engineering and Streaming Analytics Welcome and Housekeeping You should have received

Life-long Learning Perception using Cloud Database Technology Tim Niemueller , Stefan Schiffer,

Parallel Query Service for Object-centric Data Management Systems Houjun Tang , Suren Byna, Bin

Acquisition and Relocation Waivers Guidance Outlined in CPD Notice 08-02 General Waiver Process

Fundamental Constants Thoughts on data challenges for international science and the role of

Data Curation SPEC Survey Webcast Series June 14, 2017 Introductions Heidi Imker, University

Are your employees olives? Performance Evaluations Super Mammoth Mammoth Super

AbstractStorage Movingfileformatspecificabstrac7onsinto - PowerPoint PPT Presentation

AbstractStorage Movingfileformatspecificabstrac7onsinto petabytescalestoragesystems JoeBuck,NoahWatkins, CarlosMaltzahn&ScoDBrandt Introduc7on

&gt; SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda

Solar Plus Storage Solar Plus Storage Focus on Storage Benefits Focus on Storage Benefits by

Hybrid SAN &amp; Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

INF5470 Fall 2012 Lecture 10: Analog Storage Content Overview Volatile Short Term Storage

Syntax Liam OConnor CSE, UNSW (and data61) Term3 2019 1 Abstract Syntax Parsing Bindings

Introduction to Abstract Data Types Introduction to Abstract Data Types Abstract Data Type (ADT)

Abstract Classes and Interfaces (?) June 21, 2017 Reading Quiz Abstract Classes A. Abstract

CS 2334: Lab 6 Abstract Classes &amp; Interfaces Andrew H. Fagg: CS2334: Lab 6 1 Abstract Class

Cloud Storage Nabil Abdennadher nabil.abdennadher@hesge.ch 1 Cloud storage Objective

A Simulation-based Evaluation of a Hybrid Storage System combining P2P, F2F, and Cloud storage

Central Valley Gas Storage, LLC November 3, 2016 Gill Ranch Storage, LLC Lodi Gas Storage, LLC

AC Transit Bus Storage Facility July 9, 2015 TJPA Board Meeting TJPA Board Meeting Bus Storage

Introd u cing SUSE Enterprise Storage 5 1 SUSE Enterprise Storage 5 SUSE Enterprise Storage 5 is

Storage 2015 Storage Shifts and Software Defined Storage (SDS) MRMUG Chris Walker Solution

SUSE Enterprise Storage 142 142 SUSE Enterprise Storage An intelligent software-defined storage

Nexenta, OpenStorage and Commercial Open Source Anil Gulecha Developer / Community Lead,

Data Engineering and Streaming Analytics Welcome and Housekeeping You should have received

Life-long Learning Perception using Cloud Database Technology Tim Niemueller , Stefan Schiffer,

Parallel Query Service for Object-centric Data Management Systems Houjun Tang , Suren Byna, Bin

Acquisition and Relocation Waivers Guidance Outlined in CPD Notice 08-02 General Waiver Process

Fundamental Constants Thoughts on data challenges for international science and the role of

Data Curation SPEC Survey Webcast Series June 14, 2017 Introductions Heidi Imker, University

Are your employees olives? Performance Evaluations Super Mammoth Mammoth Super

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Hybrid SAN & Cluster Enterprise Network Storage Hikvision Enterprise Network Storage

CS 2334: Lab 6 Abstract Classes & Interfaces Andrew H. Fagg: CS2334: Lab 6 1 Abstract Class