anthony kougkas hariharan devarajan xian he sun akougkas
play

Anthony Kougkas, Hariharan Devarajan, Xian-He Sun - PowerPoint PPT Presentation

IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, Hariharan Devarajan, Xian-He Sun akougkas@hawk.iit.edu Department of Computer Science Special thanks to Dr. Shuibing He Illinois Institute of Technology ICS18, Beijing, China who


  1. IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, Hariharan Devarajan, Xian-He Sun akougkas@hawk.iit.edu Department of Computer Science Special thanks to Dr. Shuibing He Illinois Institute of Technology ICS’18, Beijing, China who kindly accepted to help us June 12th, 2018 present this work.

  2. IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, akougkas@hawk.iit.edu • Background • Approach • Design • Evaluation • Conclusions • Q&A ICS’18 6/10/2018 2

  3. IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, akougkas@hawk.iit.edu Highlights of this work Design of several Design and Evaluation mapping implementation results showed algorithms of IRIS IRIS can offer Cross-storage Performance Files to objects integrated boost up to 7x data access Programming Minimal Objects to files convenience and overheads efficiency Towards I/O convergence between HPC and HPDA storage subsystems

  4. IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, akougkas@hawk.iit.edu Different Communities - Cultures - Tools The tools and cultures of HPC and BigData analytics have diverged, to the detriment of both; unification is essential to address a spectrum of major research domains. - D. Reed & J. Dongarra 6/10/2018 Slide 4 Background Approach Design Evaluation Conclusions

  5. IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, akougkas@hawk.iit.edu Storage in HPC • Parallel File Systems (PFS): • Peak performance: ~2000GiB/s • Capacity: >70PiB • Interfaces: • POSIX, MPI-IO, HDF5, etc., • Limitations: • Scalability, complexity, metadata services • Small file access, data synchronization, etc., I/O 500 List (Nov 2017) 6/10/2018 Slide 5 Background Approach Design Evaluation Conclusions

  6. IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, akougkas@hawk.iit.edu Data Distribution PFS KVS Uses fixed-size stripes Key-value pair as a single object Distributes data in a fixed manner(e.g. round robin) Distributes objects to all available nodes Need for sub-request synchronization No need to synchronize anything Metadata must include the directory tree, permissions, Flat namespace with a hash table that keeps the data’s physical location on disks, etc., mapping between keys and values 6/10/2018 Slide 6 Background Approach Design Evaluation Conclusions

  7. IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, akougkas@hawk.iit.edu Data models and Storage systems • File-based storage systems • POSIX-I/O fwrite(), fread(), • • MPI-I/O • MPI_File_read(), MPI_File_Write() • High-level I/O libraries • e.g., HDF5, pNetCDF, MOAB etc • No “one -storage-for- all” solution. • Object-based storage systems • Each system performs great for • REST APIs, certain workloads. • get(), put(), delete() • Unification is essential. • Amazon S3, • OpenStack Swift • NoSQL DBs 6/10/2018 Slide 7 Background Approach Design Evaluation Conclusions

  8. PhD Comprehensive Exam Anthony Kougkas, akougkas@hawk.iit.edu Challenges of I/O Convergence traditional file- modern scalable Gap between based storage data frameworks Architectural programming software tools differences models Lack of heterogeneous diverse global management data resources namespaces 6/10/2018 Slide 8 Background Approach Design Evaluation Conclusions

  9. IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, akougkas@hawk.iit.edu Our Thesis A radical departure from the future software design and architectures will have to existing software stack for both raise the abstraction level. communities is not realistic and We aim to design and develop a bridge the semantic and architectural gaps . middleware system which can We envision a storage system the that leverages each subsystem’s strengths while offers a data path agnostic to the complementing each other for known limitations. underlying data model and 6/10/2018 Slide 9 Background Approach Design Evaluation Conclusions

  10. Introducing IRIS: I/O Redirection via Integrated Storage IRIS creates a unified “storage language” to bridge the two very different compute-centric and data-centric storage camps. IRIS: I/O Redirection via Integrated Storage 6/10/2018 10 Anthony Kougkas, akougkas@hawk.iit.edu

  11. IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, akougkas@hawk.iit.edu IRIS Design • Middle-ware library • Wrap-around I/O calls • Written in C++, modular design • Non-invasive: plugin nature • Link with applications (i.e., re-compile or LD_PRELOAD) • Existing datasets loaded upon bootstrapping via crawlers • Directory operations not supported • Deletions via invalidation 6/10/2018 11 Background Background Enosis&Syndesis Approach Design IRIS Evaluation Hermes Conclusions BBIO

  12. IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, akougkas@hawk.iit.edu IRIS Objectives • Enable MPI-based applications to access data in an Object Store without user intervention. • Enable HPDA-based applications to access data in a PFS without user intervention. • Enable a hybrid storage access layer agnostic to files or objects. 6/10/2018 12 Background Approach Design Evaluation Conclusions

  13. IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, akougkas@hawk.iit.edu IRIS Architecture Virtual File (Container) Name: string FilePointer: size_t Size: size_t Objects: • Decouples the storage interface map<VirtualObjects> InvalidObjects: • Abstracts the storage map<VirtualObjects> subsystem Virtual Object • Modular design allows addition Name: string Size: size_t of more mappers and modules OffsetInContainer: size_t • PFS and KVS equal “citizens” Data: void* • Optimized for high performance LinkedObjects: vector<VirtualObjects> 6/10/2018 13 Background Approach Design Evaluation Conclusions

  14. IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, akougkas@hawk.iit.edu • Ideal for mixed workloads (both fread() and fwrite()). • Ideal for read-only or read-heavy (e.g., >90% read) workloads. • File is divided into predefined, fixed-size, smaller units of data, called buckets . • Each write creates a plethora of new various-sized objects. • Bucket size is a tunable parameter • Equivalent to replication: sacrifice disk space to increase availability for reads. • Natural mapping of buckets-to-objects. • All available keys in a range of offset are kept in a B+ tree. IRIS Default Balanced Read-Optimized Write-Optimized HDF5 • Ideal for write-only or write-heavy (e.g., >80% write) workloads. • Exploit rich metadata info HDF5 offers to create better mappings. • Each request creates a new object. • Each HDF5 file creates 2 types of objects: header object and data object. • A mapping of offset ranges to available keys is kept in a B+ tree for fast searching. • Variable- sized data objects are created based on each dataset’s dimensions • Update operations create a new object and invalidate ensuring consistency. and datatype. 6/10/2018 Slide 14 Background Approach Design Evaluation Conclusions

  15. IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, akougkas@hawk.iit.edu Each object is mapped to a unique container. A collection of objects is mapped to a collection of containers. • • Ideal for accessing existing datasets. Threshold to create new containers (default every 128MB) bounding the total • • number of containers. Good performance for relatively small number of objects. • Special container-> update container for padding. • 1-to-1 N-to-M Simple IRIS Default N-to-1 N-to-M Optimized • Entire keyspace is mapped to one container. Objects are first hashed into a key space and then mapped to container • • Virtual objects are written sequentially. • Containers are created according to a range of hash values and their size is flexible • Updates are appended at the end of the file while invalidating the previous object. Update operations write at the end of the container, invalidating previous object. • • Indexing is important for faster get() operations. • Periodic container defragmentation to save storage space. 6/10/2018 Slide 15 Background Approach Design Evaluation Conclusions

  16. IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, akougkas@hawk.iit.edu IRIS Data Flow Example • IRIS enables new data paths • Abstracts the underlying storage solution 6/10/2018 Slide 16 Background Approach Design Evaluation Conclusions

  17. • Testbed: Chameleon System • Appliance: Bare Metal • OS: Centos 7.1 IRIS: I/O Redirection via Integrated Storage Anthony Kougkas, akougkas@hawk.iit.edu Storage: • • OrangeFS 2.9.6 MongoDB 3.4.3 • • MPI: Mpich 3.2 • Programs: • Synthetic benchmark • Montage • CM1 from NCSA • WRF • LAMMPS • K-means LANL anonymous scientific simulation • 6/10/2018 Slide 17 Background Approach Design Evaluation Conclusions

Recommend


More recommend