research in middleware systems for in situ data
play

Research in Middleware Systems For In-Situ Data Analytics and - PowerPoint PPT Presentation

Research in Middleware Systems For In-Situ Data Analytics and Instrument Data Analysis Gagan Agrawal The Ohio State University (Joint work with Yi Wang, Yu Su, Tekin Bicer and others) Outline Middleware Systems Work on In Situ Analysis


  1. Research in Middleware Systems For In-Situ Data Analytics and Instrument Data Analysis Gagan Agrawal The Ohio State University (Joint work with Yi Wang, Yu Su, Tekin Bicer and others)

  2. Outline • Middleware Systems – Work on In Situ Analysis – Analysis of Instrument Data • Compression/Summarization of Streaming Data – Post analysis using just summary

  3. In Situ Analysis – Simulation Data • In-Situ Algorithms Algorithm/Application Level – No disk I/O – Indexing, compression, visualization, statistical analysis, etc. Seamlessly Connected? • In-Situ Resource Scheduling Systems – Enhance resource utilization Platform/System Level – Simplify the management of analytics code – GoldRush, Glean, DataSpaces, FlexIO, etc. 3

  4. Opportunity • Explore the Programming Model Level in In- Situ Environment – Between application level and system level – Hides all the parallelization complexities by simplified API – A prominent example: MapReduce + In Situ 4

  5. Challenges • Hard to Adapt MR to In-Situ Environment – MR is not designed for in-situ analytics • 4 Mismatches – Data Loading Mismatch – Programming View Mismatch – Memory Constraint Mismatch – Programming Language Mismatch 5

  6. System Overview In-Situ System = Shared-Memory System + Combination = Distributed System – Partitioning In-Situ System Distributed System Shared-Memory System 6

  7. Two In-Situ Modes Space Sharing Mode : Time Sharing Mode : Enhances resource utilization when Minimizes memory consumption simulation reaches its scalability bottleneck 7

  8. Smart vs. Spark • To Make a Fair Comparison – Bypass programming view mismatch • Run on an 8-core node: multi-threaded but not distributed – Bypass memory constraint mismatch • Use a simulation emulator that consumes little memory – Bypass programming language mismatch • Rewrite the simulation in Java and only compare computation time • 40 GB input and 0.5 GB per time-step 1.8E+04 1.2E+04 K-Means Histogram Smart Spark Smart Spark 15550 10361 1.6E+04 Computation Times (secs) Computation Times (secs) 1.0E+04 1.4E+04 1.2E+04 8.0E+03 10403 6697 62X 1.0E+04 92X 7750 6.0E+03 4766 8.0E+03 6559 3992 6.0E+03 4.0E+03 4.0E+03 2.0E+03 2.0E+03 813 424 344 210 105 173 96 43 0.0E+00 0.0E+00 1 2 4 8 1 2 4 8 # of Threads 8 # of Threads

  9. Smart vs. Low-Level Implementations • Setup – Smart: time sharing mode; Low-Level: OpenMP + MPI – Apps: K-means and logistic regression – 1 TB input on 8 – 64 nodes • Programmability – 55% and 69% parallel codes are either eliminated or converted into sequential code • Performance – Up to 9% extra overheads for k-means – Nearly unnoticeable overheads for logistic regression Logistic Regression 1,800 1,600 K-Means Smart Smart 1,600 1,400 Computation Times (secs) Computation Times (secs) Low-Level Low-Level 1,400 1,200 1,200 1,000 1,000 800 800 600 600 400 400 200 200 0 0 8 16 32 64 8 16 32 64 9 # of Nodes # of Nodes

  10. Tomography at Advanced Photon Source EuroPar’15 10

  11. Tomographic Image Reconstruction • Analysis of tomographic datasets is challenging • Long image reconstruction/analysis time – E.g. 12GB Data, 12 hours with 24 Cores – Different reconstruction algorithms • Longer computation times – Input dataset < Output dataset • 73MB vs. 476MB • Parallelization using MATE+ – Predecessor of Smart System EuroPar’15 11

  12. Mapping to a MapReduce-like API Inputs IS : Assigned projection slices Recon : Reconstruction object dist : Subsetting distance Output Recon : Final reconstruction object Partial Node 0 Thread 0 Combination() Recon Rep Local /* (Partial) iteration i */ Inputs … Recon(…) For each assigned projection slice, is , in IS { … Projs. … … … Global Combination Phase IR = GetOrderedRaySubset ( is , i , dist ); Thread m For each ray, ir, in rays IR { Recon [i- Recon Recon[i] Rep … Local ( k , off , val ) = LocalRecon ( ir , Recon ( is) ); 1] … Recon(…) ReconRep ( k ) = Reduce ( ReconRep ( k ), off , val ); } … } … Node n Thread 0 /* Combine updated replicas */ … … … Inputs Recon = PartialCombination ( ReconRep ) Thread m /* Exchange and update adjacent slices*/ Partial Combination Local Reconstruction Phase Phase Recon = GlobalCombination ( Recon ) Iteration i EuroPar’15 12

  13. In Situ Analysis • How do we decide what data to save? – This analysis cannot take too much time/memory – Simulations already consume most available memory – Scientists cannot accept much slowdown for analytics • How insights can be obtained in-situ? – Must be memory and time efficient • What representation to use for data stored in disks? – Effective analysis/visualization – Disk/Network Efficient

  14. Specific Issues • Bitmaps as data summarization – Utilize extra computer power for data reduction – Save memory usage, disk I/O and network transfer time • In-Situ Data Reduction – In-Situ generate bitmaps  Bitmaps generation is time-consuming  Bitmaps before compression has big memory cost • In-Situ Data Analysis – Time steps selection  Can bitmaps support time step selection?  Efficiency of time step selection using bitmaps • Offline Analysis: – Only keep bitmaps instead of data – Types of analysis supported by bitmaps

  15. Time-Steps Selection Correlation Metrics (Slow) Correlation Metrics (Slow) Full Data IO (Slow) IO (Slow) IO Devices Bitmaps Correlation Metrics (Fast) Correlation Metrics (Fast) IO (Fast) IO (Fast) IO Devices

  16. Efficiency Comparison for In-Situ Analysis - MIC • MIC: • More cores • Lower bandwidth • Full Data (original): • Huge data writing time • Bitmaps: • Good scalability of both bitmaps generation and time step selection using bitmaps • Much smaller data writing time • Overall: 0.81x to 3.28x • Simulation: Heat3D; Processor: MIC • Time steps: select 25 over 100 time steps • 1.6 GB per time step (200*1000*1000) • Metrics: Conditional Entropy

Recommend


More recommend