Towards a BES Light Source Wide Event-triggered Tomography Data Analysis Pipeline Using a Sustainable Software Stack Hari Krishnan, Lawrence Berkeley National Laboratory CAMERA – Center for Advanced Mathematics for Energy Research Applications ALS – Advanced Light Source Data Pilot - DOE BES Light Source Pilot Project Credits for slides goes to BES Data Working Group and its members
Introduction to Tomography
Tomography @ ALS - 8.3.2 NERSC To Storage over 10G Acquistion CAMERA Local Backup Image (data8.3.2) Metadata Motor Positions Analog/Digital Inputs Ptycho-Tomography @ COSMIC Multiple Single Charge Coupled To Storage Reconstructed Detector over 10G Projections 2D Reconstruction Single Diffraction Single Surface Pattern Ptychography Projection images rendering of the Scan across +/- 70 degrees 3D volume rotation
Tomography Analysis Experiment / Compute cluster data acquisition Dask Workers remote event loop Tomography master Dask event loop Client Dask Server 1.Read Dask 2.Normalize Workers Write Dask 3.Remote Outlier Workers Execution steps Output 4.Remove Stripe 5.Padding 1.Send Workflow 6.Reconstruction 2.Execute Graph 7.Crop Graphical User Interface 3.Return result 8.Circular Mask 4.Visualization updates 9.Output User / local computer
Post Processing Tomography
Streaming Analysis Compute cluster Dask Workers remote Experiment / event loop master Dask data acquisition event loop Client Dask Server Tomogram Stream Dynamic Scaling Dask Workers Dask Workers Task 3 Execution steps Task 2 Task 2 Task 2 1.Send Workflow (GUI) 2.Execute Tasks Task N 3.Dynamically decide which Task N tasks to execute next User / local computer
Tomography @ APS - 2BM Reconstruction (Parallel SIRT) Pre-process Realtime Feedback CAMERA APS ALCF Cluster
Streaming Tomography at APS - Collaboration with MONA (APS-LBNL-BNL)
Workflows – Software View APS ALS SSRL NSLS-II Credit: Data Pilot Tomography Breakout Report (All credit goes to respective authors)
Workflows – Hardware View CPU Worker ALS ALS-8.3.2 Edison/Cori Cori CPU Worker Tomography CPU Worker CPU Worker CPU Worker APS APS ALCF F APS-2BM Tomography Cluster CPU Worker GPU Worker GPU Worker NSLS-II Compute SSH FXI-18 SSH Tomography GPU Worker GPU Worker
Challenges – Current BES Light Source Data Generation and Computing Estimates • Acquisition – Custom Instrumentation, Year Facility Detectors, Drivers ALS APS LCLS/LCLS-II NSLS-II SSRL • Networks 2021 3 PB 7 PB 30 PB 42 PB 15 PB – Custom Network 2028 31 PB 243 PB 300 PB 85 PB 15 PB Infrastructure & Estimated data generation rates per year at the BES Light Sources. At the ALS and APS, data Authentication generation will stop during 2025 and 2023, respectively, due to installations of new storage rings. Aggregate data generation across the BES Light Sources will approach the exabyte (EB) range. • Hardware Year Facility – Custom Hardware ALS APS LCLS/LCLS-II NSLS-II SSRL (FPGAs, GPUs, CPUs, …) 2021 0.1 PFLOPS 4 PFLOPS 1 - 100 PFLOPS 2.5 PFLOPS < 1 PFLOPS • Workflows 2028 30 PFLOPS 50 PFLOPS 1 - 1,000 PFLOPS 45 PFLOPS < 1 PFLOPS – Custom analytics & Estimated PFLOPS of on-demand computing resources required by each of the BES Light Sources Software dependencies by 2021 and 2028. Compute jobs requiring < 10 PFLOPS are common and best run on local resources; compute jobs requiring > 10-20 PFLOPS are best suited to run at a high-end computing facility.
High-Priority Shared Needs Mission requires computing advances in four main areas • Data management and workflow tools • Integrate beamline instruments with compute and storage • Real-time data analysis capabilities • Reduce data volumes • Provide feedback during experiments • Apply tools to steer data collection (algorithms, ML, simulation) • On-demand utilization of computing environments • Data storage and archival
Building on Common Software Tools (BES Data Pilot Project) Advanced Analysis & Automation � Advanced analysis & visualization (Post-reconstruction) � Multi-modal analysis Algorithms & Data Quality � Streaming (real-time) analysis � Implementation of � Automated Acquisition reconstruction algorithms for (ML support) shared use � Advanced visualization Exchange & Standardization features � Real-time data quality checks � Common database ks � Knowledge base of software & algorithms � Standardizes data structures � Standardized acquisition
Towards a Sustainable Software Stack Computing Resources Instrument Algorithms/Workflow Controls Orchestration Detector DataBroker Ensemble Run TomoPy bluesky Xi-CAM XPCS-Eigen PyDM Feedback Metadata Data 1 9
Part 1: Acquisition & Controls
Goal of the Bluesky Project overall: Make it easy for synchrotrons to leverage the ecosystem of freely available, open-source scientific Python community tools. DOE Light Sources Bluesky Data Broker : Search and retrieve scientific data for interactive and automated data analysis. Figure Credit: Jake vanderPlas, "The Unexpected Effectiveness of Python in Science", PyCon 2017
Bluesky Ecosystem
Data Broker
How Data Broker fits into this � Released Data Broker Version 1.0 installed at the five DOE Light Sources � Unify data access across the facilities � Improved usability , incorporating 5 years of user feedback on “beta” versions � New, hands-on tutorial materials for scientists at blueskyproject.io/tutorials � Leverages community scientific Python projects under the hood for... Labeled, physically-meaningful data structures (This work also funded partly by Scaling across thousands of nodes light source facility on HPC, cloud, or traditional servers operations.) Unopinionated about data formats
Controls Integration within the Bluesky Ecosystem Development of user interface frameworks that facilitate data acquisition and intelligent beamline Happi control applications across the DOE light-sources, Typhos including: Device location/attribute Happi DB Device abstraction layer ophyd PyDM Python Display Manager User Interface generator Typhos
Overview - Ease of use (PyDM & Qt Designer)
Part 2: Analysis & Algorithms – Xi-CAM & Workflows • GUI frontend and extensible framework for synchrotron data… – acquisition – analysis – visualization – management • Utilizes software components TomoPy developed by many external groups, Astra, LTT including NSLS-II, APS, ALS, and SLAC • Deployment platform for analysis algorithms, such as those from CAMERA
XPCS X -ray P hoton C orrelation S pectroscopy � Probes dynamics/fluctuations in materials � l ength scale: ����� - nms time scale: minutes - milliseconds � X-ray data are 2D image series that exhibit speckle fluctuations (sample dynamics) � 1st XPCS in 1995 and emerging technique Increasing coherent flux � Faster time scales (nanoseconds) � Tunable beamline energies for atomic species � in-situ or in-operando experiments � g 2 calculation at two 2D small-angle scattering different length scales pattern from a suspension of silica spheres.
Ptychography � Scanning, coherent diffractive imaging technique (CDI) versatility ��������������������������������������������������� very popular � Extremely high spatial resolution (low nanometer) many � Versatile application algorithms ������������������������������������������������� high data ����������������������� rates � Complementary techniques Exemplary ptychography setup, source: Weker Group, SSRL Scanning the sample and corresponding diffraction patterns
Putting it all together Computing Resources Instrument Algorithms/Workflows Controls Orchestration Detector DataBroker Ensemble Run TomoPy Bluesky Xi-CAM Astra, LTT PyDM Feedback Metadata Data 3 0
From Design to Execution: Tomography @ FXI-18 (NSLS-II) Subdivide into virtual Detector 2D Mosaic at tomographies W/ Z-depth each angle Stitching of Reconstruction map extracted data For each tomography scan: For each image: a. Perform inter-angle alignment (rigid x/y a. Apply flat field correction shifts to align images) b. Image quality check, re- Interleaved: b. Quick reconstruction to estimate angle acquire if failed 1. Take flats of IC in theta and phi c. Remove outliers 2. Re-calibrate X-ray c. Warp sinograms to align IC layers in d. Perform ring removal 3. Align system reconstruction for extraction e. Apply distortion correction d. Reconstruct f. Potentially apply point e. Layer extraction, segmentation, etc… spread function deconvolution
Recommend
More recommend