protein domain centric approach to study cancer somatic
play

Protein Domain-Centric Approach to Study Cancer Somatic Mutations - PowerPoint PPT Presentation

Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies Dr. Maricel G. Kann Assistant Professor Dept of Biological Sciences UMBC The term protein domain (or domain ) refers to a region of the


  1. Protein Domain-Centric Approach to Study Cancer Somatic Mutations from High-throughput Sequencing Studies Dr. Maricel G. Kann Assistant Professor Dept of Biological Sciences UMBC

  2. The term protein domain (or domain ) refers to a region of the protein with compact structure, usually with a hydrophobic core. 2

  3. Protein Domains Domains represent the functional units of the Proteins. Aloy and Russell, 2006 Protein Domains mediate 75% of the protein-protein interactions Most proteins are multi-domains (65% of Eukaryotic and 40% Prokaryotic). 3

  4. Reduces the space of inquiry  ≈ 22,000 human genes  ≈ 34,500 human RefSeq proteins  Over 550,000 human proteins from all databases listed in NCBI  Fewer than 4,500 human protein domains

  5. Majority of Disease Mutations are Inside Domains Swiss-Prot Swiss-Prot Polymorphisms Disease Mutations Outside Outside Inside 18% Inside 48% 52% 82%

  6. Different domains in the same protein may play different roles SPTB protein Spectrin beta chain, erythrocytic Spherocytosis mutation Elliptocytosis mutations actin binding domains helix forming domains Edgetic perturbation models of human inherited disorders Zhong et al ., Mol. Syst. Biol. 5, 321 (2009)

  7. Protein Domain Disease Hotspots Shared Domain Protein 1 Protein 2 Protein 3 DMDM Domain View Domain Disease Mutation Count 1 1 3 Hotspot

  8. Protein Domains Human CFTR 8

  9. http://bioinf.umbc.edu/DMDM 9

  10.  ABCC_CFTR1 domain (nucleotide binding domain 1)  Significant hotspot at position 172 10

  11. DS-Score  DS-Score (domain significance score) is a statistical measure designed to identify significantly mutated domain positions 11

  12. Data and Methods  DS-Score (domain significance score)  Derived from the probability for a domain position to contain its number of disease mutations given the domain length and total number of mutations mapping to the domain 12

  13. Study of Domain Hotspots for Disease  DMDM reveals domain hotspots for both cancer and non-cancer disease mutations (non-cancers mostly Mendelian diseases)  Use DS-Score to analyze disease mutations data  Different mutation hotspot profiles for these two different classes of disease?  Different mutation hotspot profiles for known oncogenes and tumor suppressors? 13

  14. Mutations at Domain Hotspots Randomized Randomized Cancer Non-cancer Cancer Non-cancer Mutations at 13.7% 6.4% 0.06 0.06 position-based (±0.03)% (±0.03)% hotspots Mutations at 29.2% 10.5% 0.06 0.06 feature-based (±0.03)% (±0.03)% hotspots Mutations at 54.4% 58.8% 33.2 30.8 positions with (± 2.2)% (± 3.1)% ≥ 2 mutations  P-values ≈ 0.0 for position- and feature-based hotspots  P-value < 0.05 for mutations at positions with ≥ 2 mutations Peterson, T.A. , et al. (2012) J Am Med Inform Assoc 19, 275-83. 14

  15. Hotspots at Highly Conserved Positions Cancer Non-cancer Position-based hotspots at 58.1% 51.2% conserved positions Feature-based hotspots at 67.6% 61.7% conserved positions Cancer Non-cancer Correlation Coefficient 0.19 0.10 (DS-Score, Conservation Score) Peterson, T.A. , et al. (2012) J Am Med Inform Assoc 19, 275-83. 15

  16. DS-Score Distributions 16

  17. Cancer Genes with Hotspots Above DS-Score 9.5 Gene Type Function ALK Oncogene Receptor kinase BRAF Oncogene Protein kinase EGFR Oncogene Receptor kinase FLT3 Unknown Receptor kinase GNAI2 Unknown GTPase GNAS Oncogene GTPase HRAS Oncogene GTPase KIT Oncogene Receptor kinase KRAS Oncogene GTPase MET Oncogene Receptor kinase NRAS Oncogene GTPase PDGFRA Oncogene Receptor kinase RRAS2 Oncogene GTPase 17

  18. Cancer Genes with Hotspots Below DS-Score 9.5 Gene Type Function ABL1 Oncogene Protein kinase CDK4 Unknown Protein kinase CHEK2 Tumor Suppressor Cell cycle regulator FGFR3 Unknown Receptor kinase MAP2K3 Unknown Protein kinase MEN1 Tumor Suppressor DNA repair (unclear) NF1 Tumor Suppressor RAS pathway regulator NTRK1 Oncogene Receptor kinase PIK3CA Oncogene Lipid kinase PTPN11 Oncogene Protein phosphatase RET Oncogene Receptor kinase STK11 Tumor Suppressor Protein kinase TGFBR2 Unknown Receptor kinase WT1 Tumor Suppressor Transcription factor 18

  19. DS-Score for Variant Classification Novel Variants Map to Domain DS-Score * Hotspot Likely Likely Deleterious Neutral 19

  20. DS-Score for Variant Classification Precision of DS- Specificity Score with Method (%) Precision (%) LogR.E-value (%) SIFT (1) 76.2 82.0 N/A LogR.E-value (2) 78.2 81.3 N/A Position-based 99.5 91.6 95.7 DS-Score Feature-based 98.6 87.2 91.0 DS-Score Domain positions 94.2 85.6 91.9 with ≥ 2 mutations Sensitivity: 3.3, 6.5 and 20.5 % 1. Ng, P.C. and Henikoff, S. (2003) NAR, 31, 3812-14. 2. Clifford, R.J., M.N. Edmonson, C. Nguyen, et al., Bioinformatics, 2004. 20(7): p. 1006-14. . 20

  21. Scoring gene peaks and “hills”  Cancer Mutation Prevalence score  Frequency of mutations in different contexts varies across cancers  CaMP Scores consider neighboring bases (25 contexts) Wood, L.D., D.W. Parsons, S. Jones, et al., The genomic landscapes of human breast and colorectal cancers. Science, 2007. 318 (5853): p. 1108-13. 21

  22. From Gene to Domain Landscape Each point in the grid domain landscapes represents a domain, and the peaks are estimated by aggregating all mutations for all human proteins with such domain. 22

  23. Scoring Domain Landscapes  We used domain-based counts of mutations and accounting from the different mutational contexts  We estimated the DL-Score or domain landscape score (binomial distribution, considering mutational context and aggregating all mutations for all human proteins with the domain). 23

  24. The Cancer Genome Atlas  TCGA sequence projects that we used were:  100 colon adenocarcinoma patients  522 breast invasive carcinoma patients  253 lung adenocarcinoma patients 24

  25. Domain Landscape of Colon Cancer Summary of somatic mutations occurring in the exomes of 100 colon cancer tumor samples. Synonymous SNVs and variants present in dbSNP (release 130) were removed due to their low likelihood of being driver mutations. Total patients 100 Total mutations 21,572 Total nonsynonymous SNVs 17,174 (79.6%) Total frameshift insertions 2,527 (11.7%) Total nonframeshift insertions 239 (1.1%) Total frameshift deletions 5 (0.0%) Total nonframeshift deletions 0 (0.0%) Total stop-loss SNVs 33 (0.2%) Total stop-gain SNVs 1,594 (7.4%) Mutations in domain regions 10,647 (49.4%) Average mutations per patient 216 (± 552) Number of mutations per patient 21-4,880 25

  26. Gene and Doman Landscapes 26

  27. Selected domains highly mutated in colon cancer tumors FILIP1L is known to inhibit proliferation and migration and increase apoptosis in endothelial cells, it acts as a tumor suppressor and its loss of function has been implicated in ovarian cancer, head and neck squamous cell carcinoma and oli- godendrogliomas [38,39]. Nehrt LN, Peterson TA, Park DH and Kann MG, Domain Landscapes of somatic mutations in cancer. BMC Genomics. 13 (2012) 27

  28. Shared gene and domain peaks in colon and breast cancer landscapes Nehrt LN, Peterson TA, Park DH and Kann MG, Domain Landscapes of somatic mutations in cancer. BMC Genomics. 13 (2012) 28

  29. PIK3CA domains prevalence Nehrt LN, Peterson TA, Park DH and Kann MG, Domain Landscapes of somatic mutations in cancer. BMC Genomics. 13 (2012) 29

  30. Advantages of using domain-centric approaches for analysis of disease mutations  Domain view gives the functional context of the mutation  Domain view reduces the space of inquiry  Majority of disease mutations in coding regions occur inside domains

  31. Summary Part I (mutations with known significance to phenotype)  Disease mutations tend to significantly cluster at certain domain positions  The DS-Score or domain significance score is derived from known disease mutations  DS-Score can be used to classify mutationss and will benefit from the increase on disease mutational databases 31

  32. Summary Part II (mutations with unknown significance to phenotype)  Domain landscape allows for the visualization of cluster of cancer somatic mutations at the domain level  The domain landscape score is derived from the analysis of tumor mutations from exomic or whole sequencing data  A gradient of mutation prevalence in cancer studies can be found across the different domains of a gene (PIK3CA). 32

  33. Thanks! 33

Recommend


More recommend