computing and deep learning
play

Computing and Deep Learning Johnny Israeli COMPUTE TRENDS - PowerPoint PPT Presentation

Accelerating Sequencing with GPU Computing and Deep Learning Johnny Israeli COMPUTE TRENDS GPU-Computing perf 10 1.5X per year APPLICATIONS 7 10 6 ALGORITHMS 1.1X per 10 year 5 10 SYSTEMS 4 10 CUDA 1.5X per 3 10 year 2


  1. Accelerating Sequencing with GPU Computing and Deep Learning Johnny Israeli

  2. COMPUTE TRENDS GPU-Computing perf 10 1.5X per year APPLICATIONS 7 10 6 ALGORITHMS 1.1X per 10 year 5 10 SYSTEMS 4 10 CUDA 1.5X per 3 10 year 2 Single-threaded perf ARCHITECTURE 2

  3. COMPUTE TRENDS Publications 3

  4. Sequencing Trends 4

  5. SEQUENCING TRENDS Sequencing Data Growing in Volume and Complexity Rise of Single Cell Decreasing Cost Increasing Read Length Data 5

  6. SEQUENCING TRENDS 6

  7. Worldwide Annual Sequencing Capacity 10 21 10 18 SEQUENCING TRENDS 10 15 10 12 2000 2005 2010 2015 2020 2025 7

  8. Sequencing Data Types 8

  9. SEQUENCING TRENDS: Genomics *ENA Database 9

  10. SEQUENCING TRENDS: Transcriptomics *ENA Database 10

  11. SEQUENCING TRENDS: Epigenomics *ENA Database 11

  12. SEQUENCING TRENDS: Nanopore Long Read Sequencing *ENA Database 12

  13. Variant Calling 13

  14. Variant Calling Reference TGGATTTGAAAAC G GAGCAAATGACTG TGGATTTGAAAAC G GAGCAAATGACTG Illumina TGGATTTGAAAAC G GAGCAAATGACTG Reads TGGATTTGAAAAC A GAGCAAATGACTG TGGATTTGAAAAC A GAGCAAATGACTG Map to Sequence TGGATTTGAAAAC A GAGCAAATGACTG Reference DNA TGGATTTGAAAAC G GAGCAAATGACTG ● Identify sites with potential mismatch Likely heterozygous variant ● True variants or instrument errors? ● SNPs or insertions or deletions? ● Heterozygous or homozygous variants?

  15. Example Pileup Input Data Read Index Heterozygous SNP Position

  16. GATK Variant Calling Pipeline Variant Calling Pipeline Sort Align to Mark Duplicates Call Variants Joint Call Filter Variants Reference Calibrate 16

  17. Accelerated GATK Variant Calling Pipeline Variant Calling Pipeline Sort Align to Mark Duplicates Call Variants Joint Call Filter Variants Reference Calibrate Parabricks Variant Alignment Preprocessing Variant Calling Joint Genotyping Processing 17

  18. Accelerated Variant Calling Pipelines Whole Genome Processing in Minutes Alignment + Haplotype Mutec2 GenotypeGVCF DeepVariant Preprocessing Caller Parabricks Germline Copy Number Somatic Alignment Preprocessing Variant Calling Joint Genotyping Variant Processing 18

  19. Deep Averaging Network (DAN)

  20. DAN Development ● PyTorch-based 1D model ● Learned embeddings of bases ● Encoding variant proposals ● Downsample easy variant candidates during training

  21. Variant Calling Errors

  22. Variant Calling Error Breakdown

  23. Atac Sequencing 23

  24. DNA: Open And Closed Closed DNA inactive Open DNA active Open DNA changes affect development & disease 24

  25. Atac Sequencing Mapping Open DNA Sites Sequence Map & Count Open DNA Reads Open DNA site Open DNA site 25

  26. Atac-seq Limits Atac-seq signal degrades in due to: Less sequencing • Low quality sample preparation • Small cell populations • 26

  27. AtacWorks SDK AI-Denoised ATAC-seq Data Processing High Quality Sequencing Low Quality Sequencing Sequence Map, Align, Low Quality Open DNA Count Sequencing Denoised with AtacWorks AI 27

  28. AtacWorks Model Denoising + Open Chromatin Identification Input (Noisy ATAC-Seq data) Predicted Coverage Predicted open Resblock 1 Resblock 2 Resblock 3 Resblock 4 Resblock 5 Resblock 6 Resblock 7 chromatin Evaluation: Evaluation: MSE AUPRC Pearson correlation ⊕ ReLU Conv ReLU Conv ReLU Conv 28

  29. Denoising Low Sequencing Data AtacWorks identifies open chromatin from low-coverage data 50 Million Reads 1 Million Reads 1 Million Reads + AtacWorks 29

  30. Genome-wide Sequencing Reduction AtacWorks Reduces Sequencing Requirements 3x 1M Reads 1M Reads + AtacWorks 30

  31. Denoising Low Quality Sample AtacWorks improves signal-to-noise ratio in low quality samples Distance from transcription start site 31

  32. Denoising Single Cell Atac-seq Data AtacWorks Improves Open DNA Detection From Few Cells Open DNA Detection auPRC 90 Cells 90 Cells With AtacWorks 32

  33. AtacWorks SDK SDK on Clara Genomics: https://github.com/clara-genomics/AtacWorks AtacWorks Preprint: https://www.biorxiv.org/content/10.1101/829481v1 90 Cells + AtacWorks 1M Reads + AtacWorks 1M Reads 90 Cells Reduce Sequencing Cost Improve Sample Quality Increase Single Cell Resolution 33

  34. Genome Assembly 34

  35. Long Read De Novo Assembly Step 2: Overlap graph Step 3: Error correction to Step 1: Mapping to detect traversal to generate polish genomes overlaps between reads draft genomes Draft genome Original reads ACTCGGTCATTCGTGCTTTATC GCGTTATCGTCTACTTCGT 35

  36. Genome Assembly Workflow Genome Assembly Pipeline Overlap Assemble Align Polish DL Polish MiniMap MiniASM Racon x 5 Medaka 36

  37. Accelerated Genome Assembly Workflow Before ClaraGenomicsAnalysis Genome Assembly Pipeline Overlap Assemble Align Polish DL Polish MiniMap MiniASM Racon x 5 Medaka cuDNN 37

  38. Accelerated Genome Assembly Workflow ClaraGenomicsAnalysis 0.1 Genome Assembly Pipeline Overlap Assemble Align Polish DL Polish MiniMap MiniASM Racon x 5 Medaka cudaPOA cuDNN 38

  39. Accelerated Genome Assembly Workflow ClaraGenomicsAnalysis 0.2 Genome Assembly Pipeline Overlap Assemble Align Polish DL Polish MiniMap MiniASM Racon x 5 Medaka cudaAligner cudaPOA cuDNN 39

  40. Accelerated Genome Assembly Workflow ClaraGenomicsAnalysis 0.3 Genome Assembly Pipeline Overlap Assemble Align Polish DL Polish MiniMap MiniASM Racon x 5 Medaka cudaAligner cudaPOA cuDNN cudaMapper 40

  41. ClaraGenomicsAnalysis SDK Enabling Accelerated Genome Assembly Bacteria Genome Assembly Acceleration Assembly Pipeline Overlap Assemble Align Polish DL Polish MiniMap MiniASM Racon x 5 Medaka cudaAligner cudaPOA cuDNN cudaMapper Azure v32 CPU V100 GPU 41

  42. CLARA GENOMICS SW Open Source CUDA-Accelerated Sequencing Analysis Tools APPLICATIONS Reference Applications BASECALLING GENOME ASSEMBLY AI-DENOISED ATAC-SEQ Integration with 3rd Party ClaraGenomicsAnalysis SDK AtacWorks SDK Applications and Workflows Transfer Optimized C++ API Python API Learning Inference C++ and Python APIs cudaAligne Genomics Reference cudaMapper cudaPOA r I/O Models CUDA Accelerated HPC and CUDA Deep Learning Modules 42

  43. Useful Links Parabricks: https://www.parabricks.com • ClaraGenomicsAnalysis • SDK on GitHub: https://github.com/clara-genomics/ClaraGenomicsAnalysis • C++ API Examples: cudapoa, cudaaligner • Python API Examples: cudapoa, cudaaligner • AtacWorks • SDK on GitHub: https://github.com/clara-genomics/AtacWorks • AtacWorks Preprint: https://www.biorxiv.org/content/10.1101/829481v1 • 3rd party integrations: • Racon: https://github.com/lbcb-sci/racon • Raven: https://github.com/lbcb-sci/raven • Bonito: https://github.com/nanoporetech/bonito • Additional GPU Accelerated Genomics Applications: • Kipoi Model Zoo: https://ngc.nvidia.com/catalog/containers/hpc:kipoi • SigProfiler: https://github.com/AlexandrovLab/SigProfilerExtractor • 43

  44. Accelerating Sequencing with GPU Computing and Deep Learning Johnny Israeli

Recommend


More recommend