tools for analyzing cancer variation
play

Tools for analyzing cancer variation Ekta Khurana, PhD Assistant - PowerPoint PPT Presentation

Tools for analyzing cancer variation Ekta Khurana, PhD Assistant Professor Meyer Cancer Center Englander Institute for Precision Medicine Institute for Computational Biomedicine Department of Physiology and Biophysics Weill Cornell Medicine,


  1. Tools for analyzing cancer variation Ekta Khurana, PhD Assistant Professor Meyer Cancer Center Englander Institute for Precision Medicine Institute for Computational Biomedicine Department of Physiology and Biophysics Weill Cornell Medicine, New York, NY ekk2003@med.cornell.edu @ekta_khurana 1

  2. ~500 cancer WGS ~ 3000 Alexandrov, et al., Nature , WGS from 2013 ICGC/TCGA 2014 First cancer WGS, Ley, Mardis, et al., Nature , 2008 2

  3. International Cancer Genome Consortium & The Cancer Genome Atlas ~3000 WGS (tumor & normal), ~1600 RNA-Seq, ~1500 methylaQon 3

  4. Most variants are in noncoding regions MB: medulloblastoma DLBC: B cell lymphoma STAD: gastric BRCA: breast PAAD: pancreaHc PRAD: prostate LIHC: liver PA: pilocyHc Astrocytoma LUAD: Lung adenocarcinoma Khurana et al, Nature Rev Genet , 2016 4

  5. Modes of action of noncoding variants: transcription factor binding disruption TERT promoter mutated in many different cancer types Gain-of-motif TF X CGGAGG WT Promoter 2.0 Gene 1.0 0.0 TF mRNA CGGAAG Mutated Loss-of-motif TF C TATTTAT WT A TT 2.0 T G T 1.0 T G T A A T T TF X A T 0.0 G T C C C A C G T G A C T G A T G C A A A G G G C 5 10 15 TATCTAT Mutated Loss-of-motif Altered binding effects 2.0 Killela et al, PNAS , 2013 • MYB moHf created & drives TAL1 Horn et al, Science , 2013 Huang et al, Science , 2013 overexpression in T-ALL (Mansour et al, Science , 2014) 5

  6. Co-variates of mutation rates: Increased mutation density at TF binding sites in melanoma and lung cancer Perera et al, Nature , 2016 Sabarinathan et al, Nature , 2016 Khurana, Nature News & Views , 2016 6

  7. Outline • Variants with high functional impact: FunSeq • Driver elements w/ more recurrent & high functional impact mutations than expected randomly: CompositeDriver 7

  8. IdenQfying noncoding variants associated with cancer Khurana et al, Nature Rev Genet , 2016

  9. IdenQfying noncoding variants associated with cancer FunSeq Khurana et al, Nature Rev Genet , 2016

  10. Estimating 1 negative selection 2 3 4 EvoluQonary conservaQon ConservaQon among humans - Typically defined by comparison - DepleHon of common variants/ Enrichment across species of rare variants Common variant Rare variants FracHon of rare variants = (Num of rare variants/ Total num of variants) 10

  11. Enrichment of rare SNPs as a metric for negative selection • Depletion of common polymorphisms in regions 0.9 Fraction of rare SNPs (nonsyn) under selection 0.8 Negative selection restricts the allele frequency of deleterious mutations. 0.7 • Results for coding genes 0.6 consistent with known phenotypic impacts 0.5 • Other metrics for selection 0.4 • EvoluHonary conservaHon All Coding LOF-tol. Recessive GWAS Dominant Essential Cancer (e.g. GERP) • SNP density (rare=derived allele freq < 0.5%) (confounded by mutaHon rate) LOF-tol (Loss-of-funcQon tolerant): least negaQve selecQon Cancer: most selecQon Khurana et al., Science , 2013 11

  12. Organism-level negative selection in noncoding elements Khurana et al., Science , 2013 12

  13. Negative selection and tissue-specificity of coding and noncoding regions q Ubiquitously expressed genes and bound regions show stronger selection q Differences in constraints amongst tissues q Constraints in coding genes and regulatory genes are correlated across tissues 13

  14. Which noncoding categories are under very strong “coding-like” selection ? ~0.4% genomic coverage (~ top 25) ~0.02% genomic coverage (top 5) ~400-fold q Top categories among ranked 102 categories Enrichment of know disease- q Binding peaks of some general TFs causing mutaHons from ~40-fold (eg FAM48A ) Human Gene MutaHon q Core moHfs of some TF families (eg database JUN , GATA ) q DHS sites in spinal cord and connecHve Hssue 14

  15. Human regulatory network from ENCODE ChIP-Seq Peak Calling (ChIP-Seq) Nodes Assigning TF binding sites to targets 119 TFs and ~9000 target genes Edges 28,000 interacHons Filtering high confidence edges ~28K proximal TF TF edges Strong PotenHal Proximal Distal Edge Edge Gerstein ¶ … ..Khurana ¶ … ., Nature , 2012 ( ¶ co-first authors) Using correlaHon with expression data 15 Yip et al, Genome Res , 2012

  16. Gene essentiality and human regulatory network LoF-tolerant EssenHal In-degree = 1 Size of nodes scaled by total degree Out-degree = 5 Z Gumus iCAVE 16 movie 2.5 Total degree (IN + OUT) 2.0 Wilcoxon pvalue=1.29e-2 (log scale) Regulatory degree 1.5 Essential genes 1.0 TF tend to be central 0.5 Non-TF target 0.0 LoF-tolerant Essential Gerstein ¶ … ..Khurana ¶ … ., Nature , 2012 ( ¶ co-first authors) Khurana et al., PLoS Comp. Bio. , 2013 16

  17. Identification of noncoding mutations with high impact: FunSeq 17

  18. FunSeq2: weighted scoring scheme • Feature weight - Weighted with mutation patterns in natural polymorphisms (features frequently observed weighed less) HOT region - entropy based method SensiHve region Polymorphisms Genome ! ! = 1 + ! ! !"# ! ! ! + 1 − ! ! !"# ! 1 − ! ! ! Feature weight: ! ! p w d p = probability of the feature overlapping natural polymorphisms ! ! ! ! ! ! ! !" ! !"#$%&$' ! !"#$%&"' ! !"#$% ! = ! For a variant: hkps://github.com/khuranalab/FunSeq_PCAWG Fu et al., Genome Biology , 2014 hkp://funseq2.gersteinlab.org 18

  19. IdenQfying noncoding variants associated with cancer FunSeq CompositeDriver

  20. CompositeDriver for detecting driver coding & noncoding elements (A) AlteraHons are funcHonally annotated by FunSeq2 pipeline FSi=original FunSeq2 score FS2 FS3 FS4 FS8 FS9 FS1 FS5 FS6 FS7 (B) Calculate posiHonal recurrence of each mutaHon in the cohort Sample 1 Sample 2 Sample 3 (C) Within each funcHonal region, composite funcHonal score (CFSr) is sum of recurrence mulHplied by FunSeq2 score in each posiHon with alteraHon. r = region (cds, promoter, enhancer and lincRNA) n = number of variants in r W i = number of samples with variant i (D) P-value for each region is produced from permutaHon test and Benjamini and Hochberg method to correct mulHple hypothesis tesHng. Eric Minwei Liu 20

  21. Results from 40 lung adenocarcinoma samples Coding seq. Promoters Expected (-logP) Expected (-logP) lincRNA Data from TCGA Expected (-logP)

  22. Results from 188 prostate cancer samples Q-Q plot of SNVs in coding regions Q − Q plot for SNV's in coding regions Q − Q plot for SNV's in promoter regions Q-Q plot of SNVs in promoters Q − Q plot for SNV's in enhancer regions Q-Q plot of SNVs in enhancers 4 6 7 SPOP 6 5 3 5 Observed ( − logP) Observed ( − logP) Observed ( − logP) 4 4 2 3 3 2 2 1 1 1 0 0 0 0 1 2 3 4 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 Expected ( − logP) Expected ( − logP) Expected ( − logP) Data from ICGC, Baca et al Cell 2013, Berger et al Nature 2011 22

  23. Functional validation of candidates in prostate cancer RET promoter EIF4EBP3 promoter WDR74 promoter Increased activity Reduced activity q Sanger sequencing in 19 additional samples confirms the recurrence 
 q WDR74 shows increased expression in tumor samples PCa benign In collaboration w/ Mark Rubin 23

  24. ~40 InsHtutes ~550 parHcipants Acknowledgements FuncQonal ~50 parHcipants InterpretaQon Group Yale U of Michigan Hyun Min Kang Yao Fu (now at Bina), Xinmeng Mu (now at U of Geneva Broad), Jieming Chen, Lucas Lochovsky, Arif Harmanci, Alexej Abyzov, Tuuli Lappalainen (NYGC), Emmanouil Suganthi Balasubramanian, Cristina Sisu, T. Dermitzakis Declan Clarke, Mike Wilson, Yong Kong, Mark Baylor Gerstein Daniel Challis, Uday Evani, Donna Muzny, Fuli Yu, Richard Gibbs Sanger EBI Vincenza Colonna , Yuan Chen, Yali Xue, Chris Kathryn Beal, Laura Clarke, Fiona Tyler-Smith Cunningham, Paul Flicek, Javier Cornell Herrero, Graham R. S. Ritchie Steven Lipkin, Jishnu Das, Robert Fragoza, Boston College Xiaomu Wei, Haiyuan Yu Erik Garrison, Gabor Marth Mass Gen Hospital Andrea Sboner, Dimple Chakravarty, Naoki Kasper Lage, Daniel G. MacArthur, Kitabayashi, Vaja Liluashvili, Tune H. Pers Zeynep H. Gümü ş , Kellie Cotter, Mark A. Rubin Rutgers Jeffrey A. Rosenfeld 24

  25. Sandra and Edward Khurana lab Meyer Cancer Center Eric Minwei Liu Englander Institute for Priyanka Dhingra Precision Medicine Alexander Fundichely Institute for Computational Tawny Cuykendall Biomedicine Andrea Sboner Mark Rubin Postdoc posiQons available Dimple Chakravarty khuranalab.med.cornell.edu/jobs Kellie Cotter ekk2003@med.cornell.edu Steve Lipkin Chason Lee

Recommend


More recommend