ScipionCloud Large scale cryo electron microscopy image processing on commercial and academic clouds
Who are we? The Instruct cryoEM Image Processing Center Instruct: The European Research Infrastructure for Structural Biology Providing access to state of the art structural biology infrastructure for researchers
What is Cryo Electron Microscopy Among the structural biology (SB) techniques at the core of the Instruct ESFRI project, electron microscopy under cryogenic conditions (“cryo-EM”) is currently the fastest growing area, having been nominated “Method of the Year (2015)” by Nature.
Why do we hear so much about Electron Microscopy? Because thanks to: 1) The very good performance of current microscopes 2) The very good image acquisition characteristics of Direct Electron Detector 3) The very good new software for 3D reconstruction and classification It is possible to solve the structure of large and flexible macromolecular complexes, without 3D crystals from small amounts of not very concentrated samples.
CryoEM for drug discovery Cryo EM resolving the structure of EBOLA VIRUS key glycoprotein in complex with therapeutic antibodies
Typical EM Workflow 16 cores, 2GB/core
Hardware revolution on CryoEM processing Traditionally HPC clusters or Fat Nodes Now two lines of improvement emerge: • Graphical Processing Units (GPUs) Algorithms being ported to use GPUs and new ones developed • Cloud platforms
Plethora of EM software packages: Our answer “Scipion” Workflow Integrator Bringing software integration to EM in workflows
Scipion Framework
Scipion Framework Scipion encapsulates: • Parallelization: By each EM program or by Scipion -> OpenMPI • Environment setup, libraries • Batch system submission: Scipion templates • Use of GPUs: Implemented on EM packages, each with its requirements. – Relion 2.0: Nvidia cards with at least 3.5 capability and for particles bigger than 200p 2 GPU with minimum 4GB RAM. – Motioncorr2: Cuda 8
Scipion distributions • Binaries • Source code + EM packages autoinstall • ScipionCloud: - Public AMI on AWS EC2 (EU Ireland and US North Virginia and Oregon regions) - Virtual Appliance on EGI AppDB - Vagrant file and CVMFS (Westlife project) - Puppet + Cloudify (Westlife project)
ScipionCloud • Ubuntu 14.04 LTS • Scipion release 1.1 (source git) • Most important EM packages compiled with CUDA (GPU support) • Nvidia driver + cuda toolkit (7.5 & 8.0) • Guacamole (remote desktop) • Starcluster (only AWS)
ScipionCloud profiling
Profiling workflow 0 Network transfer Acquisition 1 Import movies 2 BIM correction 3 Preprocessin CTF estimation g 4 Particle Picking 5 Particle Extraction 6 2D Classification 7 Processing Initial model 8 3D Classification 9 3D Refinement 10 3D Postprocessi postprocessing ng
Profiling data transfer • 966 movies, 8Kx8K -> 6.6 TB raw data • Used Aspera connect (from EMPIAR DB) • Tested bbcp and rsync
Profiling machine types Environment Instance vCPUs RAM (GB) GPU model GPU RAM (GB) Cost ($/hour) g2.2xlarge 8 15 GRID K520 4 0.702 p2.8xlarge 32 488 Tesla K80 12 7.776 AWS EC2 Ireland r3.8xlarge 32 244 - - 0.888 x1.32xlarge 128 1952 - - 2.96 FedCloud CESNET universe 40 232 - - - gpu1cpu6 6 24 Tesla K20 4 - FedCloud IISAS gpu2gpu12 12 48 Tesla K20 4 - asimov 32 512 - - 1.85 (est) Local titanxp 32 128 Titan XP 12 -
Profiling results EM Workflow AWS EC2 Ireland FedCloud Local server CNB Step Program Machine type Time (hours) Cost ($) Machine type Time (hours) Machine type Time (hours) Transfer movies Aspera - motioncor2 36 26 23 Align movies GPU 41 Ctf estimation ctffind4 g2.2xlarge 1gpu6cpu - Particle picking Xmipp3 Interactive Interactive Interactive Particle 0.5 Local server extraction Relion 2.0 0.6 0.4 0.4 42 2D classification Relion 2.0 GPU 6 25 8 Inital volume Eman 2.12 0.08 0.7 0.16 0.22 p2.8xlarge 4.7 2gpu12cpu 3D classification Relion 2.0 GPU 0.6 2.1 1.3 5.6 3D refinement Relion 2.0 GPU 0.7 2.6 1.8 Postprocessing Relion 2.0 0.003 0.03 0.004 0.003 Following results are not comparable since particle size was 512 px instead of 200 px. x1.32xlarge 28 448 Local server 3D refinement Relion 1.4 CPU r3.8xlarge 88 261 universe 166 74 CPU 4 r3.8xlarge 27 325
Conclusions • GPUs have changed the EM processing paradigm – Time / Cost • Cloud platforms can be a good solution for small labs that do not want to invest on hardware or occasional needs (training) • ScipionCloud allows scientists to try and use Scipion framework without dealing with installation and configuration
Plans for the future • Improve remote desktop visualization – Update Guacamole installation – Integrate VirtualGL + TurboVNC with Guacamole • Upgrade to Ubuntu 16.04 • Dynamic cluster support on Federated Cloud • Improve image contextualization – > INDIGO solutions
Acknowledgments Projects: • • EGI Engage Competence Center • Instruct Pilot EM cloud computing People: • Enol Fernandez (EGI.eu) • Boris Parak (CESNET) • Viet Tran and the other support staff at IISAS GPUCloud
References • Scipion project: http://scipion.cnb.csic.es • MoBrain project: https://mobrain.egi.eu • INSTRUCT: http://www.structuralbiology.eu • Westlife project: http://about.west-life.eu • StarCluster: http://star.mit.edu/cluster/index.html • Guacamole: http://guacamole.incubator.apache.org
Recommend
More recommend