quasi real time data analytics for free electron lasers
play

Quasi Real-time Data Analytics for Free Electron Lasers March 21 st - PowerPoint PPT Presentation

Quasi Real-time Data Analytics for Free Electron Lasers March 21 st 2018 OSG AHM Amedeo Perazzo LCLS Controls & Data Systems Division Director Outline Linac Coherent Light Source (LCLS) instruments and science case Data systems


  1. Quasi Real-time Data Analytics for Free Electron Lasers March 21 st 2018 OSG AHM Amedeo Perazzo LCLS Controls & Data Systems Division Director

  2. Outline Linac Coherent Light Source (LCLS) instruments and science case Data systems architecture Quasi real-time data analysis 2

  3. LCLS Science Case 3

  4. 4 4

  5. LCLS Instruments LCLS has already had a significant impact on many areas of science, including: Resolving the structures of ➔ macromolecular protein complexes that were previously inaccessible Capturing bond formation in ➔ the elusive transition-state of a chemical reaction ➔ Revealing the behavior of atoms and molecules in the presence of strong fields Probing extreme states of ➔ matter 5

  6. Data Analytics for high repetition rate Free Electron Lasers FEL data challenge: ● Ultrafast X-ray pulses from LCLS are used like flashes from a high-speed strobe light, producing stop-action movies of atoms and molecules ● Both data processing and scientific interpretation demand intensive computational analysis LCLS-II will increase data throughput by three orders of magnitude by 2025, creating an exceptional scientific computing challenge LCLS-II represents SLAC’s largest data challenge 6

  7. LCLS-II Data Analysis Pipelines: Nanocrystallography Example Experiment X-ray Intensity map Interpretation of Multi-megapi Description diffraction from multiple system structure / xel detector image pulses dynamics 60 GB/s 6 GB/s 1 TB/s 100 GB/s ● 8 kHz in 2024 (4 Data Reduction Data Analysis MP) • Remove”no hits” • Bragg peak finding ● 40 kHz in 2027 • >10x reduction • Index / orient patterns (16 MP) • Average • 3D intensity map •Individual nanocrystals are injected • Reconstruction into the focused LCLS pulses 3 TFlops 4 PFlops •Diffraction patterns are collected on 16 TFlops 20 PFlops a pulse-by-pulse basis •Crystal concentration dictates “hit” rate 7

  8. Data Systems Architecture 8

  9. Computing Requirements for Data Analysis: a Day in the Life of a User Perspective ● During data taking : ○ Must be able to get real time (~1 s) feedback about the quality of data taking , e.g. ■ Are we getting all the required detector contributions for each event? ■ Is the hit rate for the pulse-sample interaction high enough? ○ Must be able to get feedback about the quality of the acquired data with a latency lower (~1 min) than the typical lifetime of a measurement (~10 min) in order to optimize the experimental setup for the next measurement, e.g. ■ Are we collecting enough statistics? Is the S/N ratio as expected? ■ Is the resolution of the reconstructed electron density what we expected? ● During off shifts : must be able to run multiple passes (> 10) of the full analysis on the data acquired during the previous shift to optimize analysis parameters and, possibly, code in preparation for the next shift ● During 4 months after the experiment: must be able analyze the raw and intermediate data on fast access storage in preparation for publication ● After 4 months : if needed, must be able to restore the archived data to test new ideas, new code or new parameters 9

  10. The Challenging Characteristics of LCLS Computing 1. Fast feedback is essential Example data rate for LCLS-II (early science) (seconds / minute timescale) to ● 1 x 4 Mpixel detector @ 5 kHz = 40 GB/s reduce the time to complete the experiment, improve data quality, ● 100K points fast digitizers @ 100kHz = 20 GB/s and increase the success rate ● Distributed diagnostics 1-10 GB/s range 2. 24/7 availability Example LCLS-II and LCLS-II-HE (mature facility) 3. Short burst jobs, needing very ● 2 planes x 4 Mpixel ePixUHR @ 100 kHz = 1.6 TB/s short startup time 4. Storage represents significant fraction of the overall system 5. Throughput between storage and processing is critical Sophisticated algorithms 6. Speed and flexibility of the under development within development cycle is critical - ExaFEL wide variety of experiments, with (e.g., M-TIP for single particle rapid turnaround, and the need to imaging) modify data analysis during will require exascale machines experiments

  11. LCLS-II Data Flow > 10x Offline Petascale Up to 100 GB/s Up to 1 TB/s storage HPC Fast Data Reduction feedback Pipeline storage Onsite - Petascale Experiments Detector Offline Exascale storage HPC Online Fast Monitoring Feedback Offsite - Exascale Experiments ~ 1 s ~ 1 min Onsite (NERSC, LCF) Data reduction mitigates storage, networking, and processing requirements

  12. Data Reduction Pipeline • Besides cost, there are significant risks by not adopting on-the-fly data reduction • Inability to move the data to HEC , system complexity (robustness, intermittent failures) • Developing toolbox of techniques ( compression, feature extraction, vetoing ) to run on a Data Reduction Pipeline • Significant R&D effort , both engineering (throughput, heterogeneous architectures) and scientific (real time analysis) 12 Without on-the-fly data reduction we would face unsustainable hardware costs by 2026

  13. Make full use of national capabilities MIRA LCLS-II will require at Argonne CRTL access to High End BL ESnet Computing Facilities (NERSC and LCF) for LCLS SLAC TITAN highest demand at Oak Ridge experiments (exascale) Photon Science Speedway Stream science data files CORI at NERSC on-the-fly from the LCLS beamlines to the NERSC supercomputers via ESnet 13

  14. Quasi Real-time Data Analysis 14

  15. ExaFEL: Data Analytics at the Exascale for Free Electron Lasers Application Project within Exascale Computing Project (ECP) High data throughput experiments LCLS data analysis framework Infrastructure Algorithmic improvements and ray tracing - Porting LCLS code to supercomputer architecture, allow Data flow from SLAC to Example test-cases of Serial Femtosecond scaling from hundreds of cores (now) to hundred of NERSC over ESnet Crystallography, and Single Particle Imaging thousands of cores 15

  16. From Terascale to Exascale M-TIP: Single Particle Imaging Exascale vastly % of total images Ray tracing: Increased accuracy 10% expands the Enables de novo phasing (for experimental atomic structures with no known analogues) repertoire and 54% Number of Diffraction Patterns Analyzed computational Present IOTA toolkit s m h IOTA: Wider parameter search; t i r o Higher acceptance rate for g l a diffraction images l a n o i t a t CCTBX: u p Exascale m Present-day o P C e modeling of t a T s Bragg spots e c r a a l e s Picture credit: Kroon-Batenburg et c a l al (2015) Acta Cryst D71:1799 e 16 Analytical Detail and Scientific Payoff

  17. Scaling the nanocrystallography pipeline ● Avoidance of radiation damage and emphasis on physiological conditions requires a transition to fast (fs) X-ray light sources & large (10 6 image) datasets X-Ray Diffraction Image ● Real time data analysis within minutes Megapixel detector “diffraction-before-destruction” provides results that feed back into experimental decisions, improving the use of scarce sample and beam time ● Terabyte diffraction image datasets collected at SLAC / LCLS are transferred to NERSC Electron density (3D) of over ESnet & analyzed on Cori / KNL the macromolecule Intensity map (multiple pulses) Nick Sauter, LBNL 17

  18. Processing Needs: Onsite vs Offsite The size of each bubble represents the fraction of experiments per year whose analysis require the computing capability, Surge to offsite (NERSC & LCF) in Floating Point Operations Per Second, shown in the vertical axis ● Key requirement: data analysis must keep up with data taking rates CPU hours per experiment are given by multiplying the capability requirement (rate) by the lifetime of the experiment ● We expect to have ~150 experiments per year with a typical experiment lasting ~3x12 hours shifts Onsite processing ● Example: an experiment requiring 1 PFLOPS capability would fully utilize a 1 PFLOPS machine for 36 hours for a total of 36 M G-hours 18

  19. Offsite Data Transfer: Needs and Plans ESnet6 upgrade NERSC plans SLAC plans LCLS-I LCLS-II 19

  20. Towards Automation: end-to-end Workflow ● Workflow manages combination of data streams, hardware system components and applications to derive in quasi real-time the Rocket Launcher runs on: (demo) Cori Interactive/Login Mysql electron density from the diffraction images Model node (psana:live) GUI (future) NEWT acquired at the beamline JID (job interface daemon) runs on: ● Stream the data from the LCLS online cache (demo) Science gateway Infinite node Mongo Rocket SLUR Analysis https://portal-auth.nersc.gov/l (NVRAM) to the SLAC data transfer nodes DB Launch M job bcd/ sbatch er (future) Docker on SPIN ● Stream the data over an SDN path from the node w/ NEWT sbatch SLAC DTNs to the NERSC DTNs (actual DTNs Results Job monitoring data JSON over (status, speed, etc) subset of the supercomputer nodes) HTTPS LCLS JID webUI ● Write the data to the burst buffers layer Data@ Burst (NVRAM) NERS Stage in / Buffer C ● Distribute the data from the burst buffers to the Copy from Xrootd (over ESnet) interactive node Data@ local memory on the HPC nodes LCLS ● Orchestrate the reduction, merging, phasing and visualization parts of the SFX analysis 20

Recommend


More recommend