23451 developing a deep learning and ai
play

23451: Developing a deep learning and AI platform for life science - PowerPoint PPT Presentation

23451: Developing a deep learning and AI platform for life science research Robert Esnouf robert@well.ox.ac.uk Head of Research Computing Core, Wellcome Centre for Human Genetics Director of Research Computing, Big Data Institute Research


  1. 23451: Developing a deep learning and AI platform for life science research Robert Esnouf robert@well.ox.ac.uk Head of Research Computing Core, Wellcome Centre for Human Genetics Director of Research Computing, Big Data Institute Research Computing Strategy Officer, Nuffield Department of Medicine University of Oxford, UK Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  2. Overview of talk • The WCHG, the BDI and the Old Road Campus • Areas of interest for applying DL techniques in the clinical/life sciences • Early promising results • Expanding provision for DL/AI and general purpose GPU computing • Acknowledgments Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  3. The Wellcome Centre for Human Genetics About 500 researchers in a purpose-built institute • “to advance the understanding of genetically -related conditions through multi- disciplinary research” • Sequencing, statistical genetics, disease-focused research (diabetes, obesity, heart disease, malaria), optical microscopy, MRI, functional genetics, crystallography & electron microscopy Opened in 1999, the first building on the “Old Road Campus” surrounded by five hospitals in Headington, east Oxford Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  4. Computing growth in the WCHG • Genetics was largely a lab-based science with small separate servers for each research group • Next-generation sequencing (~2007) changed all that and in 2009 I started to build a shared infrastructure for the whole of the WCHG • WCHG now has HPC cluster of ~4200 CPU cores; 5x Tesla K80, 8x Tesla P100 and consumer cards; ~6.7PB raw GPFS and ~5PB other storage Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  5. Big data is transforming the study of human biology and disease Environmental data Death registries Screening Imaging programmes Cancer registries Hospital records Genetic data Employment records Pathology records Primary care data Built environment Pharmacy records Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  6. The Oxford Big Data Institute: The Li Ka Shing Centre for Health Information and Discovery Cohorts Measurement technologies Prospective cohorts (UKB, China, Mexico) Imaging Disease-focused cohorts Genomics and other ‘omics Partnerships with NHS / NIHR Sensors Tropical Medicine overseas centres Electronic healthcare records WHO & National ID surveillance Patient-interactive systems Interdisciplinary and problem- focused research institute of 350 researchers working on the acquisition and analysis of Integrative analysis methods Data access and sharing population-scale data resources Statistics Consent linking detailed biological Epidemiology Privacy and security measurement with longitudinal Machine learning Information governance information on health, treatment Software development Intellectual property and outcome. Computational ecosystem Standards and protocols Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  7. A research computing infrastructure for the WCHG and BDI • Linking with dark fibre and quad EDR InfiniBand • Expanding shared HPC and high-performance storage • Creating a scalable virtualization platform on OpenStack • Secure multisite scalable S3 object store • GPU-accelerated virtual desktop infrastructure and independent identity management and authorization • Opening facility across Oxford departments to drive efficient collaboration Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  8. UK Biobank: wearable accelerometer data • 103,712 participants with 7 days data per participant • 100Hz tri-axial acceleration data and 0.2Hz temperature/light information Accelerometers better than self-reporting! Self-reporting: 50% (R = 0.48 – 0.60 vs. R = 0.07 – 0.28) Accelerometer: 5% Objective measures of physical activity Self-reporting: 38% more strongly associated with mortality Accelerometer: 5% Aiden Doherty Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  9. Predicting malaria risk • Relate environmental factors (temperature, rainfall etc.) to malaria prevalence. • Using point surveys and environment from annual 5km x 5km raster pixels. • We already use stacking . Train a number of machine learning models and feed predictions from these models into a meta-learner. We use geostatistical models as our meta-learners. Tim Lucas and Pete Gething Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  10. Base-calling data from Oxford Nanopore Technologies sequencers G T T C T G T A T AT C TT As DNA or RNA pass through the pore, current traces are recorded from which the sequence of bases can be inferred (‘ base-calling ’) State-of-the-art base-callers use deep neural networks to interpret the current signals, improving on older methods from 71% (HMM; R7.3 chemistry) to 90% accurate (DNN; R9 chemistry) With more training, DNNs may be able to detect modified bases ( e.g. methylation patterns) Hannah Roberts and Gerton Lunter Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  11. Detecting rare genetic conditions from craniofacial features • There are many, many rare genetic conditions that often go undiagnosed • Something like 1 in 12 people has one of these conditions • Often these conditions are also manifest in craniofacial features • www.minervaandme.com does image analysis on faces to predict genetic conditions • With better feature recognition and DL techniques researchers expect to be able to detect more conditions more reliably Michael Ferlaino and Chris Nellåker Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  12. Deep neural networks as models for the brain • Deep neural networks (DNNs) were inspired by the Artificial neuron firing field brain and learn similar features • DNNs could take further inspiration from the brain • Can we build more sophisticated or cognitive neural representations in to DNNs? • Such as the brain’s GPS system: Real neuron firing field This approach will offer: • Insights in to principles underlying neural representations in the brain • New DNN architectures capable of powerful, brain- like computations Jessie Liu and Tom Nichols Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  13. Deep learning of chromatin features to predict islet-specific SNP effects Agata Wesolowska-Andersen, Chris Holmes and Mark McCarthy Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  14. CNNs capture motifs of input ChIP-seq and known islet transcription factors Agata Wesolowska-Andersen, Chris Holmes and Mark McCarthy Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  15. Deep learning predicts regulatory effects for high PPA SNPs Agata Wesolowska-Andersen, Chris Holmes and Mark McCarthy Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  16. Predicting Expression using Convolutional Neural Networks (CNNs) peaBrain • a promoter-derived embedding and abundance (pea) model • a convolutional neural network that leverages DNA sequence to predict expression • can be used to predict both average gene expression and variation in expression (between individuals) Moustafa Abdalla, Chris Holmes and Mark McCarthy Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  17. GPUs are necessary for computational tractibility Dataset: 19k genes x 4 kilo-basepairs x 32 channels (18.47 GB) representing the “core” promoter sequence of all protein -coding genes in the human reference genome Single Tesla K80 Quad E5-4640 (64 threads) Single Tesla P100 Single Tesla K80 Moustafa Abdalla, Chris Holmes and Mark McCarthy Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  18. CNNs already outperform previous computational and experimental methods Regularized Linear Regression Neural Network Green: fraction of genes whose expression can be predicted using the model R2 is average of repeated out-of-sample (test) sets Moustafa Abdalla, Chris Holmes and Mark McCarthy Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

  19. Provision for DL/AI in WCHG/BDI • Until Easter 2017, GPUs were mainly used for electron tomography (“dynamo”) and single - particle electron microscopy (“ relion ”) • Dell C4130 with 4x K80; Dell R730 with 1x K80; Scan workstation with TitanXp • Free-for-all access • Adding a tiered set of local and shared resources: • Initial exploration and testing • 3x Gigabyte servers each with 4x GTX 1080Ti • 1x SuperMicro workstation with 1x GTX 1080Ti • 1x Scan workstation with 1x TitanXp • Mid-scale training and inference along with image analysis • 1x Dell R730 with 1x K80 • 1x Dell C4130 with 4x K80 • 2x Dell C4130 each with 4x P100 (SXM2) • 1x Scan workstation with 1x V100 (PCIe) • Controlling access within Univa Grid Engine Robert Esnouf, University of Oxford: GTC-Europe 11 October 2017

Recommend


More recommend