key features
play

Key features 1. Require no dedicated resources 2. Almost no - PowerPoint PPT Presentation

DeltaFS Indexed Massive Dir S oftware-Defined Storage For Fast Query PDSW-DISCS 2017 Qing Zheng, George Amvrosiadis, Saurabh Kadekodi, Michael Kuchnik Chuck Cranor, Garth Gibson Brad Settlemyer, Gary Grider, Fan Guo Carnegie Mellon University


  1. DeltaFS Indexed Massive Dir S oftware-Defined Storage For Fast Query PDSW-DISCS 2017 Qing Zheng, George Amvrosiadis, Saurabh Kadekodi, Michael Kuchnik Chuck Cranor, Garth Gibson Brad Settlemyer, Gary Grider, Fan Guo Carnegie Mellon University Los Alamos National Laboratory (LANL)

  2. DeltaFS Indexed Massive Dir Key features 1. Require no dedicated resources 2. Almost no post-processing is needed 3. Low I/O overhead http://www.pdl.cmu.edu/ PDSW-DISCS 2017 2

  3. DeltaFS Indexed Massive Dir Target workloads 1. Data-intensive HPC simulations 2. Not designed for indexing checkpoints 3. I/O bandwidth is limited http://www.pdl.cmu.edu/ PDSW-DISCS 2017 3

  4. Agenda Part 1 – Motivation Part 2 – In-situ indexing design Part 3 – API, LANL VPIC integration Conclusion http://www.pdl.cmu.edu/ PDSW-DISCS 2017 4

  5. Existing HPC builds indexes during post-processing 1 2 App Indexing Lustre Write 3 Tmp Queries Delay queries until post-processing done (5-20% simulation time) http://www.pdl.cmu.edu/ PDSW-DISCS 2017 5

  6. Problem faced: The increasing time-to-science Due to the growing gap between compute and I/O Inefficient support on small data simulation start query finish

  7. Processing data in-transit while data is written to storage MapReduce App Indexing Queries Lustre Tmp Need separate resources for sorting and indexing http://www.pdl.cmu.edu/ PDSW-DISCS 2017 7

  8. In-situ indexing directly on app nodes using app resources data + index App + Indexing Lustre Tmp Queries No need for a separate indexing cluster http://www.pdl.cmu.edu/ PDSW-DISCS 2017 8

  9. Key idea: Reuse storage write-back buffering and idle CPU cycles for in-situ indexing http://www.pdl.cmu.edu/ PDSW-DISCS 2017 9

  10. Example app: LANL VPIC Particle 40 bytes Each VPIC process simulates Particles move across VPIC simulation processes during a simulation millions of particles Small random writes After simulation: high-selective queries http://www.pdl.cmu.edu/ PDSW-DISCS 2017 10

  11. TBs I/O per trajectory fetch file-per-process P P ... P Simulation procs 1M A D E One output file per F C B 1M VPIC process C E A ... Data object 1M ... Query a single particle trajectory TBs search A B C http://www.pdl.cmu.edu/ PDSW-DISCS 2017 11

  12. DeltaFS (w/ 1 CPU core) Baseline (Full-system parallel scan w/ 3k CPU cores) 0.0625 0.25 1 4 16 64 256 1024 4096 Query Time (sec) Time for reading a single particle trajectory (10TB, 48 billion particles) 5,000x faster than baseline with DeltaFS in-situ indexing http://www.pdl.cmu.edu/ PDSW-DISCS 2017 12

  13. Part II System design: Light-weight in-situ indexing 1. Tiny mem footprint 2. Zero write amplification 3. No read back

  14. Resource-efficient indexing by log-structured I/O data log buffer App thread index Indexing thread Lustre App proc Tiny mem footprint, full storage b/w util. http://www.pdl.cmu.edu/ PDSW-DISCS 2017 14

  15. LSM-Trees compacts all the time, but we can’t afford it Total simulation Compute I/O Compute I/O … Must aim for low I/O overhead at 10%-20% Compaction easily causes 1000% I/O overhead by reading/writing previously written data http://www.pdl.cmu.edu/ PDSW-DISCS 2017 15

  16. In-situ indexing by aggressive data partitioning … Compute I/O Compute I/O C A D F E B All-to-all shuffle A B C D E F App process #0 App process #1 App process #2 Bound the number of data needed per query per timestep http://www.pdl.cmu.edu/ PDSW-DISCS 2017 16

  17. In-situ indexing as a file system lib component App data data block index block data block WriteBuffer shuffle shuffle filter data block receiver sender ... ... All-to-all shuffle Index Log Data Log No dedicated cluster needed http://www.pdl.cmu.edu/ PDSW-DISCS 2017 17

  18. Part III Programming interface: Indexed Massive Directory (IMD) In-situ indexing keyed on filenames mkdir (“./particles”, DELTAFS_IMD)

  19. How to use Indexed Massive Dir (IMD) 1. Data searched together go into a single IMD file e.g. one file for each particle 2. Create as many IMD files as you want e.g. 1 trillion files for 1 trillions particles Query you data by “open -read- close” http://www.pdl.cmu.edu/ PDSW-DISCS 2017 19

  20. VPIC using DeltaFS IMD file-per-particle P P ... P Simulation procs 1M B E C F A D One IMD file per Indexed Massive 1T B E C F A D ... VPIC particle Directory Index object 1M ... Data object TBs MBs search A B C http://www.pdl.cmu.edu/ PDSW-DISCS 2017 20

  21. LANL Trinity Experiments VPIC-Baseline No post-processing VPIC buffer … HDD SSD VPIC Compute Node buffer Burst-buffer Lustre Queries 32 cores/node DeltaFS indexing 1-99 compute nodes, 496 million – 48 billion particles VPIC-DeltaFS http://www.pdl.cmu.edu/ PDSW-DISCS 2017 21

  22. 4096 Baseline (Full-system parallel scan) 1024 DeltaFS (w/ 1 CPU core) Query Time (sec) 256 64 16 4 1 5112x 992x 4049x 532x 245x 625x 2221x 665x 0.25 0.0625 0.015625 1 node 2 nodes 4 node 8 node 16 nodes 33 nodes 66 nodes 99 nodes 496 992 1,984 3,968 7,936 16,368 32,736 49,104 Simulation Size (million particles) http://www.pdl.cmu.edu/ PDSW-DISCS 2017 22

  23. 200 1.13x 1.15x 1.13x Baseline DeltaFS I/O Time per Dump (sec) 160 120 1.29x Tiny simulations Bigger simulations 80 1.56x 4.78x 2.42x 9.63x 40 0 1 node 2 nodes 4 node 8 node 16 nodes 33 nodes 66 nodes 99 nodes 496 992 1,984 3,968 7,936 16,368 32,736 49,104 Simulation Size (million particles) http://www.pdl.cmu.edu/ PDSW-DISCS 2017 23

  24. Conclusion https://github.com/pdlfs/deltafs In-situ indexing for transparent, almost-free query acceleration no dedicated nodes, no post-processing, ~15% I/O overhead • Indexed Massive Dir (~3% app mem, compaction-free, POSIX API) • Powered by Mercury RPC https://mercury-hpc.github.io/ • DeltaFS is one of the Mochi micro-services https://press3.mcs.anl.gov/mochi/ http://www.pdl.cmu.edu/ PDSW-DISCS 2017 24

Recommend


More recommend