exa dm enabling scientific discovery in exascale
play

Exa-DM: Enabling Scientific Discovery in Exascale Simulations Jeremy - PowerPoint PPT Presentation

Exa-DM: Enabling Scientific Discovery in Exascale Simulations Jeremy Iverson 1 , 2 , Ya Ju Fan 1 , George Karypis 2 , Chandrika Kamath 1 1 Lawrence Livermore National Laboratory 2 University of Minnesota DOE Exascale Research Conference October


  1. Exa-DM: Enabling Scientific Discovery in Exascale Simulations Jeremy Iverson 1 , 2 , Ya Ju Fan 1 , George Karypis 2 , Chandrika Kamath 1 1 Lawrence Livermore National Laboratory 2 University of Minnesota DOE Exascale Research Conference October 2012 LLNL-PRES-584212 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 1 / 20

  2. Outline Motivation 1 In-situ analysis 2 Compression using graph-based clustering 3 Compressed sensing 4 Next steps 5 Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 2 / 20

  3. Motivation How will analysis of simulation output change at the exascale? Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 3 / 20

  4. Motivation Present: write out the simulation output for analysis Problem: identify and track coherent structures in plasma turbulence. 64 poloidal planes, 600,000 grid points, 10 variables per grid point 8000 time steps, output every 5 time steps Unstructured mesh, 2.5TB of data Need a single analysis algorithm and parameters for all time steps. The results may be unexpected.... Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 4 / 20

  5. Motivation The present approach will not work at the exascale Present Future ... Exa-DM: Find ways of intelligently reducing the size of the output so we can still enable scientific discovery at the exascale. Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 5 / 20

  6. Motivation We are investigating several different solutions Move the analysis in-situ , but need to know which analysis algorithm and parameters to use modify algorithms: low memory sizes and high cost of data movement co-exist with the simulation Exploit similarity between coherent structures and clustering Create general reduced representations, such as compressive sensing We consider the problem of detection and tracking of coherent structures, which occurs in fusion, materials science, combustion, and other domains. A collaboration with Zhihong Lin (UCI, GSEP SciDAC PI) and Sean Garrick (UMN), who provide data and domain expertise in fusion and chemically reacting flows, respectively. Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 6 / 20

  7. In-situ analysis An in-situ version of a threshold-based algorithm Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 7 / 20

  8. In-situ analysis Threshold-based algorithm to extract coherent structures Calculate and apply the threshold Extract the structures using connected component analysis Desired output Given input Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 8 / 20

  9. In-situ analysis Parallel connected component analysis Identify local connected components Make local labels globally unique Exchange ghost labels and identify cross-PE merges All-to-all exchange of merges Local playback of merges Schematic of connected components in parallel † Work-in-progress: We are investigating variants of the basic method and analyzing their scalability. † Schematic adapted from C. Harrison, H. Childs, and K.P. Gaither, “Data-parallel mesh connected components labeling and analysis,” Eurographics Symposium on Parallel Graphics and Visualization, 2011. Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 9 / 20

  10. Compression using graph-based clustering Fast algorithms for compression using graph-based clustering Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 10 / 20

  11. Compression using graph-based clustering Goal: reduce the data required for accurate reconstruction Exploit the grid topology and local smoothness Model the grid as a graph Decompose graph into sets of vertices satisfying an error constraint Represent each set of vertices by a single value Encode sets and their representative values Compress encoding using auxiliary compression program Set-based (use only values) Region-based (also use topology) Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 11 / 20

  12. Compression using graph-based clustering We considered two encodings Scalar quantization: Differential encoding: For more details: J. Iverson, C. Kamath, G. Karypis, ”Fast and effective lossy compression algorithms for scientific datasets,” Europar, Rhodes Island, Greece, 2012. Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 12 / 20

  13. Compression using graph-based clustering We outperform state-of-the-art lossy compression methods Observations on 7 datasets (structured and unstructured): Can compress to 2-5% of original size Set-based outperforms at all PSNR levels At lower reconstruction error, gap between set-based and other methods grows Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 13 / 20

  14. Compressed sensing General reduced representations using compressed sensing Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 14 / 20

  15. Compressed sensing CS: applicable to data that are sparse in some basis Signal X of length N is sparse: if K = # nonzeros, then K << N Generate a reduced representation of X Φ ∈ R M × N Y = Φ X w here a nd M ≈ K Recover signal by solving a linear programming problem min X � X � 1 s uch that Y = Φ X Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 15 / 20

  16. Compressed sensing How do we select Φ ? Our data are sparse The sensing matrix Φ is chosen to be a random matrix † . Related to random projections (machine learning), sketches (data management), and the Johnson-Lindenstrauss Lemma (theoretical computer science). † Figure from R. Baraniuk, “Compressive Signal Processing,” Talk presented at the CASIS workshop, LLNL, May 2012; available from casis.llnl.gov. Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 16 / 20

  17. Compressed sensing Our early results indicate that this works! M= 900 Original, N=2524,K = 531 M= 1050 M= 1100 M= 900 M = 1050 M = 1100 R 2 0.59 0.86 1.0 � error � 2 6.08 3.486 0.0016 error = original − reconstructed � i x i − ˆ x = 1 x i R 2 = 1 . 0 − � ¯ x i , � i x i − ¯ x N i Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 17 / 20

  18. Next steps Next steps Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 18 / 20

  19. Next steps We continue our investigations into potential solutions Run parallel versions of the connected components algorithms to generate scaling results. Extend clustering to multivariate and temporal data. Understand better the applicability and effectiveness of compressed sensing in the context of our problem. Compare different compression techniques to evaluate their scalability, accuracy, and applicability to datasets from simulations. Our goal: Intelligently reduce the amount of data written out by extreme-scale simulations while still enabling scientists to perform data analysis and scientific discovery. Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 19 / 20

  20. Next steps Acknowledgments Zhihong Lin (UCI) and Sean Garrick (UMN) for data and domain expertise, as well as others who provided datasets in the public domain. Funding sources: ASCR Exascale Program For more details https://computation.llnl.gov/casc/sapphire/ https://computation.llnl.gov/casc/StarSapphire/ Chandrika Kamath, kamath2@llnl.gov Chandrika Kamath (LLNL) Exa-DM: Scientific Discovery at the Exascale 20 / 20

Recommend


More recommend