PROJECT COMPUTATIONAL INFRASTRUCTURE SIMONS ELECTRON MICROSCOPY CENTER HANDLING THE CHALLENGES FOR CRYO-EM PROCESSING ASSIGNEE EDWARD T ENG NOVEMBER 3, 2017 STRUCTURAL BIOLOGISTS DATE CLIENT
SIMONS ELECTRON MICROSCOPY CENTER
SIMONS ELECTRON MICROSCOPY CENTER NEW YORK STRUCTURAL BIOLOGY CENTER Simons Electron Microscopy Center Zhening Zhang Clint Potter Bridget Carragher Alex Wei Carl Negro Anchi Cheng Sargis Dallakyan Res. Scientist Director Director Technician Res. Programmer Res. Staff Scientist Res. Programmer Priyamvada Acharya Giovanna Scapin Julia Brasch Kotaro Kelly Yong Zi Tan Venkat Dandey Micah Rapp Alex Noble Embedded Post Doc. Embedded Scientist Embedded Post Doc. Post Doc. Grad. Student Post Doc. Grad Student Post Doc. Bill Rice Ed Eng Ashleigh Raczkowski Laura Kim Daija Bobe Kelsey Jordan Crystal Premo EM Manager Staff Scientist Senior Technician Research Associate Technician Technician Administrator National Resource for Automated Molecular Microscopy http://nramm.nysbc.org
What is possible? "LIFE IS REALLY SIMPLE, BUT WE INSIST ON MAKING IT COMPLICATED.” –CONFUCIUS (551-479 BCE)
What is possible today? 2.5 Å within a day
What is the timeline? 2.5 Å within a day 0h 1h 2h 4h 12h 8h
Is this routinely done? Workflow validation/testing Glutamate 20S 60S/80S Aldolase Apoferritin dehydrogenase proteasome ribosome D2 D3 O D7 C1 ~150kDa 334kDa 443kDa 750kDa ~2-4MDa Thermoplasma rabbit muscle cow liver horse spleen human or Mycoplasma
What type of computing challenge do you have? Infrastructure to do Infrastructure to cryo-EM processing for support a multi-user/ a research project/lab instrument EM facility
CyroEM Infrastructure for a lab Computation Computation Infrastructure to do Storage Storage cryo-EM processing for a research project/lab Software Software
CyroEM Infrastructure for a lab thinkmate.com Computation Storage linuxvixion.com Software exxactcorp.com singleparticle.com
The challenge of cryo-EM computation Infrastructure to do Infrastructure to cryo-EM processing for support a multi-user/ a research project/lab instrument EM facility
The challenge of cryo-EM computation How many instruments? How many users? Infrastructure to support a multi-user/ What do they want to do? instrument EM facility What support would you like to provide? Baldwin, et al. Current Opinion in Microbiology , Vol 43, 2017 (in press)
What do your users want to do? Breakdown of projects 2dx/helical 7% FIB-SEM Infrastructure to 17% support a multi-user/ Single instrument EM facility particle Tomography 67% Other 8% 1% 400 registered users, 150 active Krios users
What software do they request to use? Support required What is asked Breakdown of projects Single Particle RELION / FREALIGN / 2dx/helical cryoSPARC/ EMAN2/etc… Analysis 7% FIB-SEM 17% IMOD / Protomo / Tomography Dynamo/ PEET/PyTom/etc… Single particle Amira / IMOD / Segmentation/ Tomography 67% Other Dragonfly/etc… Annotation 8% 1% Experience 14% 3% 83% Beginner/Novice Intermediate Expert
How many instruments are used? Cameras data generation at SEMC circa 2017 FEI Titan Krios#1 / #2 / #3 Falcon3 x3 7 direct detectors K2 x3 on 4 TEMs FEI Tecnai F20 DE20 TVIPS 4K CMOS ������ FEI Tecnai Biotwin TVIPS 4K CMOS 5 CMOS/CCDs JEOL 1230 on 3 TEMs + Gatan US4000 CCD 1 SEM FEI Helios 650 ETD, TLD, ICE
How much data is generated? Other scopes & Krios & CMOS/CCD DD cameras ������ # TEM Exposure images in 1,069,315 s 2015 & 2016: e r u s #TEM Exposure images o 2,689,276 p x in 2017: e # TEM Exposure images: 3,758,591* months *Total number of saved images since 2015: 766,329,392
The challenge of cryo-EM computation ^ scalable How many instruments? Not enough and getting more How many users? Growing exponentially What do they want to do? Everything What support would you As much as possible like to provide?
SEMC solutions T he overall mission of NRAMM is to develop, test and apply technology for automating and streamlining cryo-electron microscopy (cryoEM) for structural biology. Spotiton ������
SEMC solutions On the fly data pipeline c. 2017 Camera File system / cluster Buffer server Cloud Web portal computing Leginon Workstation User Data transfer station GLOBUS Remote data transfer
SEMC solutions HPC Server and storage (DDN): ■ ■ 2 x 42U rack enclosures ■ DDN GRIDScaler GS7K appliance with 1.1PB GPFS paralegal file system ■ 420TB DDN WOS object storage for archival 1056 x CPU cores. 44 x SuperMicro nodes each with 24 x CPU ■ cores and 256GB RAM ■ 4 x GPU nodes each with one GPU and 128GB RAM. One GPU server with 8 x GPUs and 512GB RAM and 2 x GPU servers each with 4 x GPUs and 512GB RAM. ■ 4 buffer servers each with 51TB local storage, 2 x GPUs, 128GB RAM and 10G Fiber Network cards. ■ 5 x 36 QSFP port 56Gb/s FDR InfiniBand switches. ■ Bright Cluster Manager Basic Onsite Support; 7x24 remote support ■
SEMC solutions since 2015 Central MySQL Database and web server Size of images: 158.06 TB # DB records: 766,329,392 ������ Size of database: 7.44 GB 3/4 users who use Leginon also have Appion sessions 3,758,591 images 3,064 tilt series
Example: Single-particle workflow During EM session After EM session SEMC Home institution/ Leginon Appion computing cloud/SEMC session session Micrograph/ Setting up 2D classification Frame alignment Particle sorting workflow Initial 2D Data acquisition CTF estimation 3D classification classification Workflow Initial model Particle picking 3D refinement Optimization generation if needed Micrograph/ 3D refinement Model building Particle curating
Example: Single-particle workflow During EM session After EM session SEMC Home institution/ Leginon Appion computing cloud/SEMC session session MotionCor2/Unblur/ Setting up 2D classification Appion alignframes_lmbfgs/ workflow DE frame alignment/etc… RELION/cryoSPARC/ CTFFind4/gCTF/ Data acquisition 3D classification EMAN2/Xmipp/SPIDER/ ACE/etc… IMAGIC/sparx/etc… DOG picker/ Workflow VIPER/SIMPLE/SPARX/ Gautomatch/FindEM/ 3D refinement cryoSPARC/RELION/ Optimization Optimod/EMAN2/etc… EMAN2/etc… if needed RELION/FREALIGN/ Appion Model building cryoSPARC/EMAN2/Xmipp/ IMAGIC/spider/etc…
What is the timeline? 0h ������ 2h 4h 0.143 FSC 12h Resolution [1/Å] 3.1 Å 48h …96h 24h unpublished
What is the timeline? DoG picker 0h ������ Template picker MotionCor2 2h Chimera CTFFIND4 4h gCTF coot 12h 48h …96h 24h
What is the timeline? Buffer server 2xGeForce GTX 1080 GPU DoG picker 0h 9x 8TB 7.2K SATA drives, 1x 120GB SSD drive ������ Template picker MotionCor2 2h Chimera CTFFIND4 4h gCTF coot cryoSPARC workstation RELION workstation 12h 4xGeForce GTX 1070 GPU 4 x NVIDIA GeForce GTX TITAN X Pascal 2 x Ten-Core 2.20GHz 25MB Cache 2 x Ten-Core 2.20GHz 25MB Cache 48h …96h 24h 8 x 32GB 2400MHz DDR4 8 x 32GB 2400MHz DDR4 1x180GB STA SSD, 1x750GB SATA SSD 1x180GB STA SSD, 1x750GB SATA SSD
The challenge of cryo-EM computation Infrastructure to do Infrastructure cryo-EM processing for to support a multi-user/ a research project/lab instrument EM facility
“It does not matter how slowly you go as long as you do not stop.” –Confucius (551-479 BCE)
Recommend
More recommend