LSST: Petascale opportunities and challenges Tony Tyson, University of California, Davis 1
2
Relative data volume from survey telescopes & cameras 1000 Etendue ( m2 deg2 ) 100 Max 10 Survey 1 4
The new sky Probing Dark Matter and Dark Energy Mapping the Milky Way Finding Near Earth Asteroids
Data volumes & rates are unprecedented in astronomy Estimated Nightly Data Volume 20000 15000 GB 10000 5000 0 Raw Catalog LSST Pan-STARRS 4 SDSS LSST w LS will m mak ake t ten ens of of trillions phot otom ometric obser ervat ations ns o of tens ens of of billions of of obj objects 6
DM System is widely distributed Archive Site Headquarters Site Archive Center Systems Operations Co-located Center (SOC) Data Access Center (DAC) Education and Public Outreach Center (EPOC) Site • • A physical location/space that hosts DM centers • Connected via dedicated, Base Site protected fiber optic circuits • Center Base Center • A DM functional Co-located capability Data Access Center (DAC) hosted at a Site NSF Review 7 December 15-17, 2009 Tucson, AZ
DM System relies on large-scale computational parallelism • With few exceptions, LSST pipeline processing is “embarassingly parallel” 3024 parallel – image readouts O(10 8 ) sky tiles – O(10 9 ) objects – • Computational clusters are well matched to the available parallelism 5000 cores at – Base 12000 (yr1) – – 33000 (yr10) cores at Archive • Middleware implements flexible pipeline/production model of parallelism 8
DATA PRODUCTS CLASSIF IFIC ICATION ON 9
IMAGE SIMULATIONS All Sky Database Extended Milkyway Sources Transients Defects Base Catalog Solar Cosmology System Generate the seed catalog as Instance Catalog Operation required for simulation. Includes: Generation Simulation Metadata Color Type Size Brightness DM Data Variability Position Proper motion base load simulation Source Image Introduce shear parameter from Generation Generate cosmology metadata per FOV Atmosphere Operation Photon Telescope Simulation Propagation Camera Defects Generate per Formatting Sensor Calibration DM Pipelines Simulation LSST Sample Images and Catalogs 10
Full end-to-end simulations 11
The Data Challenge 3 Terabytes per hour that must be mined in real time. 20 billion objects will be monitored for important variations in real time. A new approach must be developed for knowledge extraction in real time. 12
The Data Challenge ~3 Terabytes per hour that must be mined in real time. 20 billion objects will be monitored for important variations in real time. A new approach must be developed for knowledge extraction in real time. 13
LSST 14
LSST 15
Analytics Complex computations • 100s of attributes per query Iterative, successively more restrictive Curiosity driven questions 3 major query types • Needle in haystack • Correlations • Time series
Science at the Limit Much of the breakthrough science using surveys (imaging or spectroscopy) have occurred at the limits of the surveys Sample incompleteness Systematic errors 17
LSST Wide-Fast-Deep survey • 4 billion galaxies with redshifts • Time domain: 1 million supernovae 1 million galaxy lenses 1 billion moving objects new phenomena
LSST Wide-Fast-Deep survey • 4 billion galaxies with redshifts • Time domain: 1 million supernovae 1 million galaxy lenses 1 billion moving objects new phenomena
Major opportunity and challenge: 20
Characterize the known clustering ) Assign the new ( classification ) Discover the unknown ( outlier detection ) Tom Vestrand Benefits of very large data sets: • best statistical analysis of “typical” events • automated search for “rare” events 21
The dimension reduction problem: Finding correlations and “fundamental planes” of parameters • The Curse of High Dimensionality ! – Are there combinations (linear or non-linear functions) of observational parameters that correlate strongly with one another? – Are there eigenvectors or condensed representations (e.g., basis sets) that represent the full set of properties?
Automated discovery Data exploration This is required also for automated Data Quality Assessment 23
How To Learn More / Get Involved? LSST lsst.org • Check out LS S T dat abase t rac at ht t p:/ / dev.lsstcorp.org/ trac/ wiki/ LS S TDat abase XLDB • XLDB4 (Oct 6-7@ S LAC) Open conference starting this year • Read past XLDB report s ht t p:/ / www-conf.slac.stanford.edu/ xldb • S hare your use cases, j oin t he communit y SciDB 1 st public release • Check out ht t p:/ / scidb.org • Try it out
Recommend
More recommend