large scale cancer genomics data analysis
play

Large-scale Cancer Genomics Data Analysis David Haussler Center - PowerPoint PPT Presentation

Large-scale Cancer Genomics Data Analysis David Haussler Center for Biomolecular Science and Engineering, UC Santa Cruz Cancer Genomics Hub Being built to store BAM & VCF for TCGA, TARGET and CGAP/CGCI projects Designed for 25,000


  1. Large-scale Cancer Genomics Data Analysis David Haussler Center for Biomolecular Science and Engineering, UC Santa Cruz

  2. Cancer Genomics Hub Being built to store BAM &  VCF for TCGA, TARGET and CGAP/CGCI projects Designed for 25,000 cases  with average of 200 gigabytes per case 5 petabytes (5 x 10 15 ) total,  scalable to 20 petabytes General Parallel File System,  Dual RAID 6 subsystems, Redundant I/O paths, 16 application processors, 12 storage controllers co-location opportunities 

  3. CGHub Goals  Enable direct comparison and combined analysis of many large-scale cancer genomics datasets  aggregate enough data to provide the statistical power to attack the full complexity of cancer mutations  Set standards for data storage and exchange; encourage data sharing  Maintain compatibility with EGA, dbGaP, ICGC, 1000 Genomes Project, ENCODE and other large-scale genomics efforts (e.g. VCF format, data access coordination)

  4. Given the same BAM files, different mutation calling pipelines do not completely agree TCGA-13-0725_ Total calls: Called by 2 Called by at Broad UCSC other centers least 1 other Broad: 3,194 62% 85% 494 304 276 UCSC: 2,688 74% 89% 1982 WUSTL: 3,125 63% 82% 442 126 Still work to do to 575 harden mutation- WUSTL 0 calling software

  5. We are just beginning to look at accuracy and consistency in the detection of structural variation Case study: UCSC and Broad analysis of whole genome GBM data

  6. Samples Analyzed

  7. Gene fusions: BamBam 167, dRanger 188 136 potentially overlapping events

  8. Whole Genome View 06-0188 06-0152 •Circle plot shows amplifications, deletions, inter/ intra chromosomal rearrangement • These 2 samples have 23/ 25 top dRanger, 21/ 29 top bambam events

  9. Independent events lead to somatic homozygous loss of tumor suppressors CDKN2A/B Germ line CDKN2A/ B chr9 chr9 chr11 1 2 3 4 5 chr11 GBM CDKN2A/ B Segmental Deletion chr9 Non-reciprocal CDKN2A chr9 Translocation CDKN2B chr11(p15.5-15.3) chr11 chr11

  10. In 11/16 cases similar events lead to homozygous loss of CDKN2A/B One Copy Deleted by Other Copy Deleted by Arm-Level loss of chr9p 5 GBMs Focal Loss (via inter-chrom translocation) Arm-Level loss of chr9p 3 GBMs Focal Loss (mechanism unknown) 2 GBMs Focal Loss Complete loss of chr9 1 GBM Focal Loss Complex event 5 GBMs No loss detected No loss detected Zack Sanborn

  11. Features of CDKN2A/B normal samples

  12. Chromothripsis in a gliblastoma Inter-chromosomal links to chr7 GBM-0152 chr12 MDM2 LEMD3-c12orf56 Fusion

  13. GBM-0152 MDM2 chr12 Amplified regions are connected chr7 chr2 EGFR

  14. EGFR Amplifcation/Mutation  11/17 samples have chr7 amplifications including EGFR  4/11 also have EGFRviii mutations  Exon 2-7 deletion at low copy  Probably happened after amplification events  Selection for low copy?

  15. Example: EGFRviii mutation

  16. GBMs release exosomes. Could some GBM tumor DNA show up in the blood?

  17. Amplified events may provide enough reads to detect this GBM: TCGA-06-0152 left-hand edge of EGFR amplicon, connected to chr12

  18. GBM: TCGA-06-0152 left-hand edge of EGFR amplicon, connected to chr12 Split Reads Similar pattern of mismatches

  19. Copy Number States Single Copy 0 1 2 3 Amplification of chr7, chr19, & chr20 Normal chr9q (Diploid) Minority Copy 1 Number chr6p Homozygous Single Copy Deletion of Loss of chr10 CDKN2A/B chr9p 0 Overall Copy Number GBM: TCGA-06-0185 Zack Sanborn

  20. Simulated Progression Model to Infer Karyotype Mixture EGFR Proportion CDKN2A/B 23% 1 Tumorigenesis 1 2a 15% 2a 2b 8% 2b 3 ? 54% 3 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 1 Chromosomes Zack Sanborn

  21. UCSC Cancer Integration Group Steve Benz Josh Stuart, Co-PI Charlie Vaske * Zack Sanborn Jing Zhu Sam Ng Chris Szeto Amie Radenbaugh James Durbin Ted Golstein Mark Diekhans * Mia Grifford Melissa Cline Dan Carlin Kyle Elrott Brian Craft Sofie Salama * Chris Wilks Artem Sokolov

  22. Allele-Specific Copy Number heterozygous sites Matched genomic position Normal Majority Allele Read Counts genomic Tumor position Minority Allele deletion Read Counts Zack Sanborn

  23. Tumors exhibit multiple rounds of duplication, rearrangement and loss Colon 5EKFO estimated (Meyerson) normal contamination 2 3 Single Copy Normal Amplification (Diploid) Minority 1 Copy Number Est. Normal CN-LOH Contamination 0 Overall Copy Number Zack Sanborn

  24. Copy Number Profile Analysis Ovarian TCGA-13-1411 estimated normal # Total Copies contamination KRAS 0 1 2 3 4 5 6 3 2 # Minority Minority Copies 1 Copy Number 0 Est. Normal Contamination Overall Copy Number Zack Sanborn

  25. Many rearrangements in amplified regions MDM2 -CDK4 0 6 -0 1 5 2

Recommend


More recommend