using gpu and power8
play

USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD Ido Machol Aiden - PowerPoint PPT Presentation

USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD Ido Machol Aiden Lab Baylor College of Medicine Rice University GTC 2015 THE HUMAN GENOME IS LONG! 3 BILLION Letters 2 METERS


  1. USING GPU AND POWER8 TO EXPLORE HOW GENOMES FOLD Ido Machol Aiden Lab Baylor College of Medicine Rice University GTC 2015

  2. THE HUMAN GENOME IS LONG! 3 BILLION Letters 2 METERS …CGTTTACGAAAATCGCAAAACTTTCGATACCCATAGGCTACTGATCATACGACCGTTTACGAAAATCGAAACCTTTCCGATCTAGGCTAC… Nucleus Cell 6 μ m

  3. 100 Mb 10 Mb 1 Mb 100 Kb 10 Kb 1 Kb 100 bp 10 bp

  4. SAME GENOME, DIFFERENT FUNCTIONS

  5. PART I: TECHNOLOGY

  6. MICROSCOPY & FLUORESCENT IN SITU HYBRIDIZATION FISH

  7. CONTACT MAPPING Exploring structure via proximity

  8. Times in the Same Photo 0-3 (lives far away) 4-11 (lives nearby) FACEBOOK CONTACT MAP Always (same person) Homer

  9. 2 0 1 2 1 0 1 0 0 0 3 2 1 0 0 0 0 0 1 2 16 6 5 4 11 1 1 Simpsons' 2 1 6 8 6 3 4 0 0 Contact 1 0 5 6 8 4 5 1 0 Map 0 0 4 3 4 5 5 0 0 1 0 11 4 5 5 11 1 1 # of Pictures Together 0 0 1 0 1 0 1 2 1 0 16 0 0 1 0 0 0 1 1 1

  10. Hi-C 3D Genome Sequencing

  11. Hi-C: genome-wide Chromosome Conformation Capture Erez Lieberman-Aiden, Nynke van Berkum et al. Science 2009

  12. Computational Challenge I Alignment, calculate contacts Sequence …CTGCCTCCTCGCGG CCGCGTGGTGGCAG… Align to reference genome … … DNA Reference

  13. Alignment is not trivial …CTGCC _ TCCTCGCGG… Insertion Deletion Substitution …CTGC __ TCCTCGCGG… …CTGCC C TCCTCGCGG… …CTG AA_ TCCTCGCGG…

  14. Computational HW and SW setup

  15. Rice RSCG PowerOmics hardware 8 x Power8 Servers 2 Sockets x 12 cores x 8 threads = 192 virtual cores each Total of 1,536 virtual cores in cluster. • 4 X 256GB RAM • 2 X 1024GB RAM • 2 X 256GB RAM with NVIDIA K40 Tesla Model 8247-22L and 8247-42L Byte order: BI-Endian

  16. GPUs Tesla K40 Stream Processors 2880 Core Clock 745MHz Boost Clock(s) 810MHz, 875MHz Memory Clock 6GHz GDDR5 VRAM 12GB Single Precision 4.29 TFLOPS Double Precision 1.43 TFLOPS (1/3)

  17. Storage • IBM GPFS Storage Server (Model 24) • 4 X JBOD • Total of 361 TB fast scratch disk space • (Up to 1.4 Peta bytes) • FlashSystem 840 20TB Flash

  18. Interconnect Interconnect: • 56 Gigabit 36-port FDR IB switch • Mellanox Next gen Connect-IB FDR Host Channel Adapters • 10-Gigabit Ethernet • Internet 2

  19. Rice RSCG PowerOmics software Cluster management • IBM Platform LSF, PPM, PAC, PowerKVM 2.1.0 Operating system • Ubuntu 14.4 (little-endian) + Red Hat Enterprise Linux 7.0 Storage • Mellanox OFED 2.4-1 • GPFS 4.1 Scientific • BioBuilds 2014.11

  20. Challenge - Alignment of billions of contacts High Resolution Map 13 billion reads forming 5 billion contacts in the map IBM Power8 Cluster 675 read alignments / second / CPU core …CTGCCTCCTCGCGG… 192 cores About 27 hours

  21. Genome Hi-C GENERATES GENOME- Chromosome WIDE CONTACT MAPS

  22. Genome Hi-C GENERATES GENOME- WIDE CONTACT MAPS

  23. Chromosome 8 Genome Hi-C GENERATES GENOME- WIDE CONTACT MAPS 0 700 Reads/250 kb 2

  24. A Hi-C GENERATES GENOME- A WIDE CONTACT MAPS 0 700 Reads/250 kb 2

  25. A B Hi-C GENERATES GENOME- A WIDE CONTACT MAPS B 0 700 Reads/250 kb 2

  26. Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome Erez Lieberman-Aiden, Nynke van Berkum et al. Science 2009 Science, 2009 PART II: BIOLOGY

  27. Genomic analysis of compartments The two compartments correlate strongly with open and closed chromatin Genes 2 Pixels 2 Pixels Chromosome 14 100 1 Mb kb

  28. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X The whole genome is plaid

  29. A TOUR OF THE NUCLEUS

  30. Organization observed at three distinct scales NUCLEAR SCALE CHROMOSOME SCALE MEGABASE SCALE 100Mb 10Mb 1Mb

  31. Organization observed at three distinct scales NUCLEAR SCALE CHROMOSOME SCALE MEGABASE SCALE 100Mb 10Mb 1Mb

  32. Organization observed at three distinct scales NUCLEAR SCALE CHROMOSOME SCALE MEGABASE SCALE 100Mb 10Mb 1Mb

  33. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping Suhas Rao*, Miriam Huntley*, Neva Durand, Elena Stamenova, Ivan Bochkov, James Robinson, Adrian Sanborn, Ido Machol, Arina Omer, Eric Lander, Erez Lieberman Aiden Cell 2014

  34. 30 million contacts More Contacts, Higher Resolution 5 billion contacts

  35. Detection of Chromatin Loops Genome- wide via Hi-C A+2 ε B-2 ε A+ ε B- ε A B B+2 ε A-2 ε A- ε B+ ε

  36. Into the loops L1 L2 L3 L1 L2 L3

  37. Computational Challenge III Loop calling Which one shows a loop?

  38. 3D Map Features X ✔ X X

  39. Computational Challenge III Loop calling • Apply 4 filters for each pixel. • 20 Giga pixel image. • Millions of parallel filters. NVIDIA Tesla GPU 200x faster than previous CPU implementation – from 3 weeks to 3 hours.

  40. 10,000 Loops in the Human Genome

  41. Loops turn genes on and off Lymphoblastoid cell Lung fibroblast cell

  42. SUMMARY OF COMPUTATIONAL EFFORTS

  43. Sequence alignment proportions Genome data production and analysis • In about 36 months we produced sequence equivalent of more than 2200x coverage of the human genome. • For reference, the Human Genome Project produced 12.6x coverage, over the span of 4 years. Storage • We currently have 25 TB of RAW sequenced data • We sequence 1 TB each month. • After processing the raw sequenced data, we store 3 TB of Raw and processed data.

  44. Computational speed up Cluster processing • We produce 1 Billion reads per month. • Power8 is capable of processing alignments at 675 reads/second per CPU core. • 50% faster then the cluster system we were using before. • At this speed, we consume about 17 “CPU days” per month. • With power8 cluster having over 192 cores, the jobs complete processing in about 2 hours. GPU processing • Using NVIDIA Tesla K40, we run our loop calling algorithm over a 20Giga pixel map 200x faster than CPU implementation. • Instead of 3 weeks we get the work done in only 3 hours.

  45. aidenlab.org/juicebox

  46. GREETINGS FROM Aiden Lab Broad Institute Eric Lander ANOTHER DIMENSION Erez Lieberman Aiden Jim Robinson Suhas Rao Miriam Huntley Neva C Durand Elena Stamenova Adrian Sanborn Arina Omer Ivan Bochkov Olga Dudchenko Robert Nnake Su-Chen Huang Muhammad Shamim Chris Lui Sarah Nyquist Sanjit Batra Ashok Cutkosky Najeeb Tarazi Jian Li

Recommend


More recommend