Processing The Next Generation of Angstrom-Scale Microscopy Dr Lance Wilson Senior HPC Consultant @ MASSIVE
Source https://www.monash.edu/research/infrastructure/delivering-impact/research-outcomes/cryo-em/half-a-million-dollar-tick
Figure 5. 3D structure of the Ebola spike. Beniac DR, Melito PL, deVarennes SL, Hiebert SL, Rabb MJ, et al. (2012) The Organisation of Ebola Virus Reveals a Capacity for Extensive, Modular Polyploidy. PLOS ONE 7(1): e29608. https://doi.org/10.1371/journal.pone.0029608 http://journals.plos.org/plosone/article?id=10.1371/jo urnal.pone.0029608
What is Cryo-Electron Microscopy? 5
“Seeing Molecular Interactions of Large Complexes by Cryo Electron Microscopy at Atomic Resolution” Laurie J. Wang Z.H. Zhou 2014
Computed Tomography CryoEM + Photogrammetry
http://www.cmu.edu/me/xctf/xrayct/index.html https://www.maximintegrated.com/en/app-notes/index.mvp/id/4682
https://skfb.ly/TI89
openstack.org
openstack.org
3D reconstruction of the electron density of aE11 Fab’ polyC9 complex openstack.org
Institution Strategy 14
“Here is your CD of data…” to “Your data is moving up to a data management system in the cloud where you have access to a range of tools and services to start your data analysis” 15
MyData App Publication Cryo-Em: Cryo-Em PC MyTardis DOI, Reuse Titan Collect Data Krios movies, storage, sharing Images, MyData Raw Model frames App Ctffind Strudel Corrected Relion Desktop & Web Stills Frealign Picking
What is the scope? Or How big is the computer? 18
Compute and Storage Requirements § ~1-4TB raw data set/sample ~2000-5000 files § Pipeline analysis with internal & external tools § Require large memory gpu > 8GB § Require large system memory > 64GB § Require cpu cores 200 - 400 § Parallel file reads and writes 19
Task Submitted? GPU? Nodes Time Import No < 1 min How long do each Motion Correction Yes Yes 3 20 min CTF estimation Yes No 1 20 min of the steps take? Manual Picking No ? Autopicking Yes Yes 2 40 min 2,500 images Particle Extraction Yes No 1 10 min 2D Classification Yes Yes 2 10 min/iteration 150,000 particles 3D Classification Yes Yes 1 10 min/iteration 260 pixels 3D Refine Yes Yes 2 5-10 min/iteration Movie Refine Yes No 1 1 hour Particle Polishing Yes No 1 1-2 hours Mask Creation No 5-30 min Postprocessing No <1 min ~3 days
Options for Processing Cloud Workstation HPC Pro Pro Pro Full user control Huge resources Scales easily Con Con Limited by single Con Tightly controlled, machine shared Cost, complexity, data movement 21
Why use OpenStack for this workflow? 22
https://sites.google.com/site/emcloudprocessing/home/relion2#TOC-Benchmark-Tests
Motion Correction The electron beam drifts during § How to achieve maximum § collection and the results need to be processing speed? shifted to account for it. § http://www.ncbi.nlm.nih.gov/pubmed/2 § # GPUs = # raw movies 3644547 § Storage I/O > Processing § Software: motioncorr 2.1 § 3.4 GB of raw movie data (15 movies) I/O § 218MB of corrected micrographs (15 images) Processing Time:- 109s § Single Nvidia K80 § 24
Case study - Motion Correction 10x speedup! Using local desktop :- ~3hrs ● Limited by local storage and network access, single gpu Using remote dekstop on MASSIVE:- ~45mins ● Limited by GPU (2 per desktop) Using an inhouse parallel scripted version:- ~4.5mins ● Limited by file system bandwidth (1-2 GB/s) 25
What approaches/software is used? Relion Simple Cryosparc 26
Scientific Problem Definition (What does a scientist do?) 30
Typical workflow for processing using Relion2
Hardware for Relion2 Comparisons Dell HPC Nodes NVIDIA DGX-1 24 Cores (2 CPUs) 32 Cores (2 CPUs) 256 GB Ram 512 GB Ram 4 x K80 GPUs 8 x P100 GPUs Results Hardware for CryoSparc Comparisons Dell HPC Nodes NVIDIA DGX-1 24 Cores (2 CPUs) 32 Cores (2 CPUs) 256 GB Ram 512 GB Ram 4 x K80 GPUs 8 x P100 GPUs
Cryosparc - Ab-initio Step 36 33 30 RUN TIME (MINS) 27 24 21 18 15 12 9 6 3 0 1 2 3 4 5 6 7 RUN NUMBER K80 P100 * 24 Core, 128GB Ram with 1 of either K80 or P100 GPU
Cryosparc - Refinement Step 30 25 RUN TIME (MINS) 20 15 10 5 0 1 2 3 4 5 RUN NUMBER K80 P100 * 24 Core, 128GB Ram with 1 of either K80 or P100 GPU
How many CPUs do you need? What effect does the number of CPUs have on analysis time?
Relion - 3D Classification Step 90 80 PROCESSING TIME (MINS) 70 60 50 K80 40 DGX 30 20 10 0 4 8 16 NUMBER OF CORES (THREADS) K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU DGX-1 = 32 x CPU, 512GB RAM, 8 x P100
Relion - 3D Classification Step 2 PROCESSING TIME SPEEDUP RELATIVE TO K80 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 4 8 16 NUMBER OF CORES (THREADS) K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU DGX-1 = 32 x CPU, 512GB RAM, 8 x P100
Relion - 3D Classification Step 140 K80 Processing Time 120 PROCESSING TIME (MINS) DGX Processing Time 100 80 60 40 20 0 3 4 5 6 8 9 10 12 13 16 17 18 20 24 K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU PROCESSING THREADS DGX-1 = 32 x CPU, 512GB RAM, 8 x P100
Relion Class 3D - MPI Task Effects 140 K80 Processing Time 120 PROCESSING TIME (MINS) DGX Processing Time 100 80 60 40 20 0 3 5 9 13 17 K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU MPI TASKS DGX-1 = 32 x CPU, 512GB RAM, 8 x P100
How many GPUs do you need? Relion2 - Class 3D Step
Relion Class 3D - Effect of No. GPUs 140 120 PROCESSING TIME (MINS) 100 80 K80 60 DGX 40 20 0 2 4 8 NUMBER OF GPUS K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU DGX-1 = 32 x CPU, 512GB RAM, 8 x P100
Relion - 2D Classification Step 400 350 PROCESSING TIME (MINS) 300 250 200 K80 150 DGX 100 50 0 17 21 NUMBER OF MPI TASKS (~CORES) K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU DGX-1 = 32 x CPU, 512GB RAM, 8 x P100
Relion - 2D Classification Step 5 PROCESSING TIME SPEEDUP RELATIVE TO K80 4.5 4 3.5 3 2.5 2 1.5 1 17 21 K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU NUMBER OF CORES (THREADS) DGX-1 = 32 x CPU, 512GB RAM, 8 x P100
How many GPUs do you need? Relion2 - Class 2D Step
Relion Class 2D - Effect of No. of K80 GPUs 800 700 PROCESSING TIME (MINS) 600 500 400 300 200 100 0 4 8 12 16 NUMBER OF GPUS
How does this hardware compare against workstations?
Processing Time vs Hardware Configuration for 3d Classification Step 25 Processing Time (Hours) 20 15 10 5 0 Hardware Configuration
Insert Picture of tweet
49
So … what is the solution?
Solution GPU cluster Large high performance file system Remote desktop with access to cluster (for GPU and storage) HPC and domain experts optimising pipeline for systems Outcome Data processing faster than collection! Shared resource for optimal use 51
Acknowledgments § Jon Mansour for help gathering benchmarking results § Jafar Lie for help gathering benchmarking results and § Mike Wang (NVIDIA) for access to the NVIDIA DGX-1 52
Questions?
Recommend
More recommend