lsst petascale opportunities and challenges
play

LSST: Petascale opportunities and challenges Tony Tyson, University - PowerPoint PPT Presentation

LSST: Petascale opportunities and challenges Tony Tyson, University of California, Davis 1 2 Relative data volume from survey telescopes & cameras 1000 Etendue ( m2 deg2 ) 100 Max 10 Survey 1 4 The new sky Probing Dark Matter


  1. LSST: Petascale opportunities and challenges Tony Tyson, University of California, Davis 1

  2. 2

  3. Relative data volume from survey telescopes & cameras 1000 Etendue ( m2 deg2 ) 100 Max 10 Survey 1 4

  4. The new sky Probing Dark Matter and Dark Energy Mapping the Milky Way Finding Near Earth Asteroids

  5. Data volumes & rates are unprecedented in astronomy Estimated Nightly Data Volume 20000 15000 GB 10000 5000 0 Raw Catalog LSST Pan-STARRS 4 SDSS LSST w LS will m mak ake t ten ens of of trillions phot otom ometric obser ervat ations ns o of tens ens of of billions of of obj objects 6

  6. DM System is widely distributed Archive Site Headquarters Site Archive Center Systems Operations Co-located Center (SOC) Data Access Center (DAC) Education and Public Outreach Center (EPOC) Site • • A physical location/space that hosts DM centers • Connected via dedicated, Base Site protected fiber optic circuits • Center Base Center • A DM functional Co-located capability Data Access Center (DAC) hosted at a Site NSF Review 7 December 15-17, 2009 Tucson, AZ

  7. DM System relies on large-scale computational parallelism • With few exceptions, LSST pipeline processing is “embarassingly parallel” 3024 parallel – image readouts O(10 8 ) sky tiles – O(10 9 ) objects – • Computational clusters are well matched to the available parallelism 5000 cores at – Base 12000 (yr1) – – 33000 (yr10) cores at Archive • Middleware implements flexible pipeline/production model of parallelism 8

  8. DATA PRODUCTS CLASSIF IFIC ICATION ON 9

  9. IMAGE SIMULATIONS All Sky Database Extended Milkyway Sources Transients Defects Base Catalog Solar Cosmology System Generate the seed catalog as Instance Catalog Operation required for simulation. Includes: Generation Simulation Metadata Color Type Size Brightness DM Data Variability Position Proper motion base load simulation Source Image Introduce shear parameter from Generation Generate cosmology metadata per FOV Atmosphere Operation Photon Telescope Simulation Propagation Camera Defects Generate per Formatting Sensor Calibration DM Pipelines Simulation LSST Sample Images and Catalogs 10

  10. Full end-to-end simulations 11

  11. The Data Challenge  3 Terabytes per hour that must be mined in real time.  20 billion objects will be monitored for important variations in real time.  A new approach must be developed for knowledge extraction in real time. 12

  12. The Data Challenge  ~3 Terabytes per hour that must be mined in real time.  20 billion objects will be monitored for important variations in real time.  A new approach must be developed for knowledge extraction in real time. 13

  13. LSST 14

  14. LSST 15

  15. Analytics  Complex computations • 100s of attributes per query  Iterative, successively more restrictive  Curiosity driven questions  3 major query types • Needle in haystack • Correlations • Time series

  16. Science at the Limit  Much of the breakthrough science using surveys (imaging or spectroscopy) have occurred at the limits of the surveys  Sample incompleteness   Systematic errors 17

  17. LSST Wide-Fast-Deep survey • 4 billion galaxies with redshifts • Time domain: 1 million supernovae 1 million galaxy lenses 1 billion moving objects new phenomena

  18. LSST Wide-Fast-Deep survey • 4 billion galaxies with redshifts • Time domain: 1 million supernovae 1 million galaxy lenses 1 billion moving objects new phenomena

  19. Major opportunity and challenge: 20

  20.  Characterize the known clustering )  Assign the new ( classification )  Discover the unknown ( outlier detection ) Tom Vestrand Benefits of very large data sets: • best statistical analysis of “typical” events • automated search for “rare” events 21

  21. The dimension reduction problem: Finding correlations and “fundamental planes” of parameters • The Curse of High Dimensionality ! – Are there combinations (linear or non-linear functions) of observational parameters that correlate strongly with one another? – Are there eigenvectors or condensed representations (e.g., basis sets) that represent the full set of properties?

  22. Automated discovery Data exploration This is required also for automated Data Quality Assessment 23

  23. How To Learn More / Get Involved?  LSST lsst.org • Check out LS S T dat abase t rac at ht t p:/ / dev.lsstcorp.org/ trac/ wiki/ LS S TDat abase  XLDB • XLDB4 (Oct 6-7@ S LAC) Open conference starting this year • Read past XLDB report s ht t p:/ / www-conf.slac.stanford.edu/ xldb • S hare your use cases, j oin t he communit y  SciDB 1 st public release • Check out ht t p:/ / scidb.org • Try it out

Recommend


More recommend