Current and Future Data Intensive Computing at DOE BES User Facilities Steve Miller Scientific Computing Group Leader Neutron Scattering Science Division Mark L. Green Tech-X Corporation
DOE Science Programs Org Chart http://www.er.doe.gov/about/Organization/Organization_chart/OneSC-org.pdf#page=2 2 Managed by UT-Battelle NSSD – Data Analysis for the U.S. Department of Energy
DOE Neutron and X-Ray User Facilities “As part of its mission, the Office of Basic Energy Sciences (BES) plans, constructs, and operates major scientific user facilities to serve researchers from universities, national laboratories, and industry.” • National Synchrotron Light Source (NSLS) – BNL, Brookhaven NY • National Synchrotron Light Source II (NSLS-II) – BNL, Brookhaven NY • Stanford Synchrotron Radiation Lightsource (SSRL) – SLAC, Stanford CA • Advance Light Source (ALS) – LBNL, Berkeley CA • Advanced Photon Source (APS) – ANL, Chicago IL • Linac Coherent Light Source (LCLS) – SLAC, Stanford CA • Spallation Neutron Source (SNS) – ORNL, Oak Ridge TN • High Flux Isotope Reactor (HFIR) – ORNL, Oak Ridge TN • Manuel Lujan Jr. Neutron Scattering Center (Lujan Center) – LANL, Los Alamos NM http://www.sc.doe.gov/bes/BESfacilities.htm 3 Managed by UT-Battelle NSSD – Data Analysis for the U.S. Department of Energy
Facility Users and Their Research • BES User facilities serve over 10,000 scientists and researchers annually – Facility users are largely NSF funded to perform their research – Diverse science areas include: material science, biology, chemistry, physics, crystalography, geology, and more. • Scientific techniques utilized for collecting data in the areas of: – spectroscopy – diffraction – scattering – imaging – tomography http://www.sc.doe.gov/bes/BESfacilities.htm 4 Managed by UT-Battelle NSSD – Data Analysis for the U.S. Department of Energy
ORNL Neutron Scattering Facilities SNS and HFIR SNS construction completed May 2006, user ops December 2007 SNS has achieved ~ 800 kWatt beam power 7 SNS fully operational instruments, 4 commissioning, and adding 2 more in FY10 – total of 23 instruments. Second SNS target station being planned which will add another ~24 instruments. HFIR cold source operational May 2007 (One of the brightest in the world) 7 HFIR operational instruments (2 SANS on cold source) 2 HFIR instruments commissioning 5 Managed by UT-Battelle NSSD – Data Analysis for the U.S. Department of Energy
Instruments are Big! Person to scale size Detector Banks Sequoia Instrument Sequoia Instrument: • 187 detector banks • 8 tubes per bank, 128 pixels per tube, 8333 TOF samples per pixel, 4 Bytes each • Total histogram size per “run”: ~6GB, some instruments will produce 15GB per run • Experiments comprised of multiple runs • Data value estimated at approximately $32M/TB to produce 6 Managed by UT-Battelle NSSD – Data Analysis for the U.S. Department of Energy
Data Production • SNS – 100MB to 15GB per measurement depending upon instrument – Data collected in “event mode” which can be streamed – 1.3GB/day/instrument average data rate • APS – 60 beam lines – Data collected ranges from a few KB to 100GB – Tomography beamlines can produce 10TB per experiment – Diffraction instruments can collect 300MB/sec continuous • LCLS – Up to tens of GB/sec peak data rate – 1TB/day – First data to be collected in September 7 Managed by UT-Battelle NSSD – Data Analysis for the U.S. Department of Energy
Data Access • For some instruments, users bring their own USB drives and copy data during their experiment time • FTP • Web based data portal • File formats: – NeXus HDF5 data format for neutron and some X-ray instruments – Proprietary or non-standard data formats for some instruments 8 Managed by UT-Battelle NSSD – Data Analysis for the U.S. Department of Energy
9 Managed by UT-Battelle NSSD – Data Analysis for the U.S. Department of Energy
ISSUES • User authentication and cybersecurity • Metadata definition, capture, and data association • Facility infrastructure capacity • Data policy • Computing culture – “the reduction software and data format for these large instruments needs rethinking or papers will not be published. My data files are too big (over 1.6Gb) and even with 4Gb of ram on my computer and looking at my data is painful.” 10 Managed by UT-Battelle NSSD – Data Analysis for the U.S. Department of Energy
Emerging Vision – Interconnected Facilities • Facilitate new scientific discoveries via • Facilities seek to provide users multi-technique data analyses inter-facility data movement - • Thick Client applications SNS, HFIR, APS, Lujan, and LCLS run local or remote already collaborating coordinate jobs and data movement • User Facility network (UFnet) gives users the autonomy they want needed on top of Esnet Orbiter thick client platform under • Each facility needs a “Network Node” abstracting operation from construction via DOE SBIR with Tech-X • Ad-hoc Virtual Organizations analysis which would be enabled user defined via “ Nodeware ” enable data sharing 11 Managed by UT-Battelle NSSD – Data Analysis for the U.S. Department of Energy
Software Development Strategy • Facility concentrates on science software development • Provide facility data management infrastructure • Leverage where possible! – SciDAC tools – Esnet – TeraGrid – Collaborations and partnerships 12 Managed by UT-Battelle NSSD – Data Analysis for the U.S. Department of Energy
Data Intensive Computing for Users • Want to process collections of data comprised of multiple runs • Willing to wait seconds to minutes for results • Need to visualize data – Processed – Live streaming • Want the same access at home as well as at the instrument • Growing interest in collaborating and for using multi- technique data 13 Managed by UT-Battelle NSSD – Data Analysis for the U.S. Department of Energy
Questions? 14 Managed by UT-Battelle NSSD – Data Analysis for the U.S. Department of Energy
Recommend
More recommend