A deep learning based approach for genetic risk prediction Raquel - PowerPoint PPT Presentation

A deep learning based approach for genetic risk prediction Raquel Dias, PhD. Senior Staff Scientist Scripps Research Translational Institute raqueld@scripps.edu, @RaquelDiasSRTI Ali Torkamani, PhD. atorkama@scripps.edu, @ATorkamani

Whole Genome Sequencing vs. Genotype array Full Data Sparse Data (whole genome sequencing) (genotype array) 0 0 1 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 ? ? ? 0 0 1 1 ? ? 0 ? ? ? ? 0 0 1 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 ? ? ? 0 0 1 1 ? ? 0 ? ? ? ? 0 1 0 1 0 1 0 1 1 1 1 0 1 1 0 1 0 1 ? ? ? 1 0 1 1 ? ? 0 ? ? ? ? 0 0 1 0 0 0 0 1 1 0 0 1 0 1 1 0 0 0 ? ? ? 0 0 1 1 ? ? 1 ? ? ? ? 0 1 0 1 0 1 0 1 1 1 1 0 1 1 0 1 0 1 ? ? ? 1 0 1 1 ? ? 0 ? ? ? ? 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 1 0 1 ? ? ? 1 0 1 1 ? ? 0 ? ? ? ?

Whole Genome Sequencing vs. Genotype array Full Data Sparse Data ~80M genetic variants ~4 million genetic 0 0 1 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 ? ? ? 0 0 1 1 ? ? 0 ? ? ? ? 0 0 1 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 ? ? ? 0 0 1 1 ? ? 0 ? ? ? ? 0 1 0 1 0 1 0 1 1 1 1 0 1 1 0 1 0 1 ? ? ? 1 0 1 1 ? ? 0 ? ? ? ? 0 0 1 0 0 0 0 1 1 0 0 1 0 1 1 0 0 0 ? ? ? 0 0 1 1 ? ? 1 ? ? ? ? 0 1 0 1 0 1 0 1 1 1 1 0 1 1 0 1 0 1 ? ? ? 1 0 1 1 ? ? 0 ? ? ? ? 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 1 0 1 ? ? ? 1 0 1 1 ? ? 0 ? ? ? ?

Genetic imputation problem ... ... 0 0 0 1 1 1 0 0 1 1 0 0 0 1 1 1 1 ... ... ... ... 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 ... ... HapMap or Reference ... ... 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 ... ... 1,000 Genomes haplotypes … (whole genome) ... ... 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 ... ... Prediction 0 0 ? 0 0 1 1 ? 0 ? ? ? 0 ? 1 ? 1 0 0 ? 0 0 1 1 ? 0 ? ? ? 0 ? 1 ? 1 0 0 ? 0 0 1 1 ? 0 ? ? ? 0 ? 1 ? 1 Cases and Study 0 0 ? 0 0 1 1 ? 0 ? ? ? 0 ? 1 ? 1 Controls typed genotypes 0 0 ? 0 0 1 1 ? 0 ? ? ? 0 ? 1 ? 1 Genotype array 0 0 ? 0 0 1 1 ? 0 ? ? ? 0 ? 1 ? 1 0 0 ? 0 0 1 1 ? 0 ? ? ? 0 ? 1 ? 1 0 0 ? 0 0 1 1 ? 0 ? ? ? 0 ? 1 ? 1

A typical imputation approach ... ... 0 0 0 1 1 1 0 0 1 1 0 0 0 1 1 1 1 ... ... Multiethnic ... ... 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 ... ... Haplotype Reference ... ... 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 ... ... Reference panel … Consortium ... ... 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 ... ... (HRC) Mapping 0 0 ? 0 0 1 1 ? 0 ? ? ? 0 ? 1 ? 1 0 0 ? 0 0 1 1 ? 0 ? ? ? 0 ? 1 ? 1 Linkage disequilibrium 0 0 ? 0 0 1 1 ? 0 ? ? ? 0 ? 1 ? 1 Study (LD r 2 ) structure 0 0 ? 0 0 1 1 ? 0 ? ? ? 0 ? 1 ? 1 genotypes 0 0 ? 0 0 1 1 ? 0 ? ? ? 0 ? 1 ? 1 Prediction 0 0 ? 0 0 1 1 ? 0 ? ? ? 0 ? 1 ? 1 0 0 ? 0 0 1 1 ? 0 ? ? ? 0 ? 1 ? 1

A typical imputation approach ... ... 0 0 0 1 1 1 0 0 1 1 0 0 0 1 1 1 1 ... ... Muli-ethinic ... ... 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 ... ... Haplotype Reference ... ... 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 ... ... Reference panel … Consortium ... ... 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 ... ... (HRC) Mapping 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 Linkage disequilibrium 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 Study (LD r 2 ) structure 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 genotypes 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 Prediction 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1

Polygenic Risk Score (PRS)

Polygenic Risk Calculation w/ Trait* w/o Trait Σ Design Results Polygenic Risk Score 100,000+ subjects Millions of known variants Cumulative sum *Trait can often be heterogeneous e.g. coronary artery = heart attack, stroke, bypass surgery, etc.

Objectives 1. More accurate and faster imputation 2. Find important genetic variants 3. Better polygenic risk score calculation

Our proposed approach Encoding v Output Input Hidden layer layer layer 1 1 1 1 0 2 0 0 2 2 𝑥 𝑥′ Decoding

Denoising autoencoder for image restoration Noise Mask Bigdeli, Siavash Arjomand, and Matthias Zwicker. "Image restoration using autoencoding priors." arXiv preprint arXiv:1703.09964 (2017). Wang, Ruxin, and Dacheng Tao. "Non-local auto-encoder with collaborative stabilization for image restoration." IEEE Transactions on Image Processing 25.5 (2016): 2117-2129.

Genotype imputation case study example Ground truth Masked input (whole genome sequencing) (genotype array) 0 0 1 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 ? ? ? 0 0 1 1 ? ? 0 ? ? ? ? 0 0 1 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0 ? ? ? 0 0 1 1 ? ? 0 ? ? ? ? 0 1 0 1 0 1 0 1 1 1 1 0 1 1 0 1 0 1 ? ? ? 1 0 1 1 ? ? 0 ? ? ? ? 0 0 1 0 0 0 0 1 1 0 0 1 0 1 1 0 0 0 ? ? ? 0 0 1 1 ? ? 1 ? ? ? ? 0 1 0 1 0 1 0 1 1 1 1 0 1 1 0 1 0 1 ? ? ? 1 0 1 1 ? ? 0 ? ? ? ? 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 1 0 1 ? ? ? 1 0 1 1 ? ? 0 ? ? ? ? Mask

Case study: 9p21.3 region of the genome • Length: 59846 bp • 846 genetic variants in reference panel (whole genome data) • Approx. 200 common variants • Approx. 600 rare variants • Only 17-47 variants in genotype array!!! • Strong association to coronary artery disease (CAD) • Genotyped and sequenced in many studies

Training on the reference panel: Data augmentation strategy ... ... 0 0 0 1 1 1 0 0 1 1 0 0 0 1 1 1 1 ... ... Reference ... ... 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 ... ... Whole Genome ... ... 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 ... ... Mask ... ... 0 0 0 1 1 1 0 0 1 1 0 0 0 1 1 1 1 ... ... Masked ... ... 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 ... ... input ... ... 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 ... ... Autoencoder ... ... 0 0 0 1 1 1 0 0 1 1 0 0 0 1 1 1 1 ... ... Reconstructed ... ... 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 ... ... Output ... ... 0 1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 ... ...

Customized Sparsity Loss Function  Sparsity loss with Kullback-Leibler (KL) / cross entropy element: 𝜍 = 𝜍 ∗ log 𝜍 + 1 − 𝜍 ∗ log 1 − 𝜍 𝐸 𝐿𝑀 (𝜍| ො 𝜍 ො 1 − ො 𝜍  Customized loss adjusted for hidden activation sparsity: 𝑜 𝑚𝑝𝑡𝑡 = 𝑁𝑇𝐹 + 𝛾 ∗ ෍ 𝐸 𝐿𝑀(𝑗) 𝑗=1  Mean Squared Error: 𝑜 𝑁𝑇𝐹 = 1 𝑜 ෍ (𝑧 𝑗 − ො 𝑧 𝑗 ) 𝑗=1

Hyper parameters to be optimized • b • r • Activation functions • L1/L2 regularizers • Learning rate • Batch size

Parallel Grid Search Hyperparameter optimization approach 10000 10000 100 9999 9999 … … … 2 4 4 1 3 3 2 2 100 X grid 10000 X training 10000 X training 1 1 search samples Hyperparameter Trained model Grid samples Grid samples combinations performance 9 GPUs available: - 7 GTX 1080, 860 hours Accuracy, loss - 1 Titan V, (sequential run, Sparsity, MSE - 1 Titan Xp 100 epochs)

Grid Search Results: training accuracy

Grid Search results: assessing best hyperparameter values

Effect of hyper parameter values in training accuracy  b b Pearson correlation (r 2 )  r r  Learning rate

Optimizing batch size: training accuracy Accuracy Loss Learning steps Learning steps 10 batches 50 batches 100 batches 1000 batches

Optimizing batch size: training run time Accuracy Run time (hours) 10 batches 50 batches 100 batches 1000 batches

Testing on multiple case studies • Atherosclerosis Risk in Communities (ARIC) • More than 3000 samples • Whole genome sequencing (846 variants, 0% mask, ground truth) • Affymetrix 6.0 genotype array (17 variants, 98% mask, input data) • Framingham Heart Study (FHS) • More than 500 samples • Whole genome sequencing (846 variants, 0% mask, ground truth) • Illumina 500K genotype array (47 variants, 95% mask, input data) • Illumina 5M (93 variants, 89% mask, input data)

Accuracy in additional case studies: Proposed approach versus common statistic methodology Performance: all variants Performance: rare variants

Accuracy in additional case studies: Proposed approach versus common statistic methodology Performance: all variants Performance: common variants

Run time: Proposed approach versus common statistic methodology

Linkage disequilibrium structure: ARIC Ground truth Linkage disequilibrium (LD) r 2 Prediction All variants Rare variants Common variants

Linkage disequilibrium structure: FHS Ground truth Linkage disequilibrium (LD) r 2 Prediction All variants Rare variants Common variants

Interpretability: identifying representative genetic variants Maximal information criteria

A deep learning based approach for genetic risk prediction Raquel - PowerPoint PPT Presentation

A deep learning based approach for genetic risk prediction Raquel Dias, PhD. Senior Staff Scientist Scripps Research Translational Institute raqueld@scripps.edu, @RaquelDiasSRTI Ali Torkamani, PhD. atorkama@scripps.edu, @ATorkamani Whole

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Using Social Media Campaigns for Cause Marketing Presenters: Angela Connelly, Chief Marketing

and societally relevant part of a published article's quality " Jan Velterop Vienna

Prevalence and factors associated with late referral presentation of CKD patients to

165 Countries Services Projects in Local Communities and Abroad to Address Poverty Health

RICHARD KARLSSON LINNR NETSPAR TASKFORCE DAY 13 FEBRUARY 2020 # Het begint met een

Blogging Workshop for the Health Education England Genomics Education Programme Why blog?

Neurodegenerative Disease Research (JPND) Coordinating approaches to research across Europe

In novations for social and digital In novations for social and digital transformations for health

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

A deep learning based approach for genetic risk prediction Raquel - PowerPoint PPT Presentation

A deep learning based approach for genetic risk prediction Raquel Dias, PhD. Senior Staff Scientist Scripps Research Translational Institute raqueld@scripps.edu, @RaquelDiasSRTI Ali Torkamani, PhD. atorkama@scripps.edu, @ATorkamani Whole

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Using Social Media Campaigns for Cause Marketing Presenters: Angela Connelly, Chief Marketing

and societally relevant part of a published article's quality &quot; Jan Velterop Vienna

Prevalence and factors associated with late referral presentation of CKD patients to

165 Countries Services Projects in Local Communities and Abroad to Address Poverty Health

RICHARD KARLSSON LINNR NETSPAR TASKFORCE DAY 13 FEBRUARY 2020 # Het begint met een

Blogging Workshop for the Health Education England Genomics Education Programme Why blog?

Neurodegenerative Disease Research (JPND) Coordinating approaches to research across Europe

In novations for social and digital In novations for social and digital transformations for health

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

and societally relevant part of a published article's quality " Jan Velterop Vienna