Evolutionary Conservation of Human Phosphorylation Sites Javad Safaei 1 , Jan Manuch 1 , Arvind Gupta 1 , Ladislav Stacho 2 , Steven Pelech 3 1. UBC, Department of Computer Science 2. SFU, Department of Mathematics 3. UBC, Department of Medicine, and Kinexus Bioinformatics Corporation 1 1:02 AM
Cell Signaling Network � Human body consists of different types of cells � 23,000 different protein types in cells � Different cell types are different in the level of each protein type � Defects in the cell signaling network leads to 400 diseases (esp. Cancer, Diabetes, and Alzheimer) � Modeling the network is useful for drug discovery
Cell Phosphorylation Network � Network defect correlates with 400 diseases (Cancer) � Phosphorylation Important PTMs � Protein kinases phosphorylate � substrates, protein phosphatases dephosphorylate substrates, phospho- dependent proteins bind to phosphate and move around it Can change protein function and 3D � structure dramatically � Phosphosites Only on serine (S), threonine(T), � tyrosine (Y), and rarely histidine (H). Into two main groups: Inhibitory: inhibit the protein from its � activity Activatory: activate protein � 3 1:02 AM
Phosphosites Conservation � Why? Correlate conservation and � inhibition/activation sites Correlate conservation and confirmed � disease mutation data Investigate conservation in S, T, and Y � sites How much phosphates (negative � moiety) are replaced by negatively charged amino acids: aspartic (D), glutamic (E) amino acids � Conservation of sites, requires conservation of proteins, and that requires recognition of human protein orthologs in other species. 4 1:02 AM
Orthologs Recognition The most similar protein in the � other species is the orthologs protein Certain threshold of similarity � needed for ortholog Global sequence alignment is the � similarity measure Protein orthologs are aligned � with blue rectangles Number of proteins are different � in different species 5 1:02 AM
Orthologs Recognition Big protein databases, need to be done fast and accurately � For each species build the blast database from Fasta sequences � Species_DB <= formatdb -i Species_Seqs -p T -o T -p T works proteins, and -o T to create indices in the results. � For each human protein run blast search on each formatted species db, and retrieve top five � candidate proteins Top_5_Orthologs <= blastp –i Input_Seq –d Species_DB –b 5 Blast is imperfect database search, therefore for each candidate protein compute the global � alignment based on Needleman–Wunsch. Protein with the highest percent identity is chosen as the human protein in that ortholog. � Works correctly to find the protein itself in human protein database. � 6 1:02 AM
Conservation of Phosphosites Phosphosites are analyzed through regions r 1 , r 2 , r 3 (subsequence) centered at each site (15 � residues in our case) This region is well known in biology and specificity of the kinases and phosphatases is defined � using it. Globally aligning human proteins ( p h ) with species orthologs ( p s ), automatically aligns � phospho-regions but with high probability of gaps. We modified needleman-wunsch global alignment to take gaps outside of the phospho- � regions, and also predict more sites in the ortholog: constrained global alignment (CGA) Some sites ( r 3 ) are aligned with different amino acids than S, T, Y (we don’t count those cases � in statistics). 7 1:02 AM
Constrained Global Alignment (CGA) 8 1:02 AM
Phosphosite Prediction in Human Prediction Species # Proteins P-Ser P-Thr P-Tyr Total Sites Yeast 1,542 8,184 1,855 0 10,039 Yeast to Human Human 311 225 126 9 360 Ratio (Human/Yeast) 20.17% 2.75% 6.79% NA 3.59% Worm 696 3,060 440 114 3,614 Worm to Human Human 369 178 82 27 287 Ratio (Human/Worm) 53.02% 5.82% 18.64% 23.68% 7.94% Fruit Fly 3,956 11,556 3,495 705 15,756 Fruit fly to Human Human 1,676 1,666 917 188 2,771 Ratio (Human/Fruit Fly) 42.37% 14.42% 26.24% 26.67% 17.59% Total Predicted Human 2,356 2,069 1,125 224 3,418 Sites gathered from PhosphositePlus, Phospho-ELM, Phosidia, Literature � Prediction of over 3,000 phospho-sites by constrained GA from 30,000 sites in 3 different � species. (T, Y)-sites are more conserved than S-sites. � zero Y -site in yeast, leads to 9 Y -sites in Human (i.e. S, T have changed to Y in human) � The more similar specie to human, the more sites predicted in human. � 9 1:02 AM
Phosphosite Prediction in Species � Using 90K experimentally P-Ser P-Thr P-Tyr Thr/Ser All confirmed phosphosites in human Human 53,478 16,971 18,849 32% 89,298 1 Mouse 45,096 14,344 16,598 32% 76,038 2 Dog 42,479 13,605 15,830 32% 71,914 3 Chimpanzee 41,471 14,030 15,227 34% 70,728 � Prediction of over 620K sites in 19 4 Rhesus macaque 40,163 13,228 14,735 33% 68,126 species 5 Rat 39,733 13,437 14,672 34% 67,842 6 Chicken 30,333 11,233 12,566 37% 54,132 7 Brachydanio rerio 26,669 11,045 11,050 41% 48,764 8 Duckbill platypus 24,467 9,035 10,023 37% 43,525 � Availability 9 African clawed frog 19,780 8,617 8,911 44% 37,308 � www.phosphonet.ca includes exact 10 Fruit fly 9,665 5,878 4,698 61% 20,241 11 Purple sea urchin 8,156 4,709 3,489 58% 16,354 proteins and sites information 12 Honeybee 6,766 4,219 3,440 62% 14,425 13 Nematode worm 5,364 3,390 2,846 63% 11,600 14 Baker's yeast 3,135 2,223 1,661 71% 7,019 � The farther the species, the more 15 Mouse-ear cress 3,070 1,752 1,444 57% 6,266 Thr/Ser- ratio 16 Red bread mold 791 671 557 85% 2,019 17 Maize 693 419 488 60% 1,600 18 Western balsam poplar 747 430 371 58% 1,548 19 Tammar wallaby 38 31 23 82% 92 Total Predicted Sites 348,616 132,296 138,629 NA 619,541 10 1:02 AM
Human Phosphosites Scores � Avg Activation score -4 , is Check if the negatively charged PO 3 replaced by Aspartic (D) or Gultamic (E) acids in other species to keep the functionality. � Avg Conservation score � Identity Conservation � Similarity Conservation � Divide by the number of found phospho-regions (less than 20) 11 1:02 AM
Amino Acids Similarity T o compute percent similarity of phospho-regions, the following graph is suggested by � experience. Edges means similarity � Different than BLOSUM matrix that is for conservation � df � 12 1:02 AM
Conclusion Results � Conservation Similarity is used P-Ser P-Thr P-Tyr Total All P-Sites Phospho-Thr sites are more conserved, � than Ser-Tyr sites. Total: 89,298 0.00242 Avg Activation 0.00390 0.00099 -0.00048 26.98 Avg Conservation 25.62 27.33 30.52 Phospho-Thr/Phospho-Ser ratio � Functional P-Sites increase in farther species to human Total: 769 0.009709 Avg Activation - Activating 0.006557 0.006543 0.018085 Kinase sites are more conserved than a � 36.59 Avg Conservation - Activating 35.81 39.36 34.57 random site in a substrate 0.003 Avg Activation - Inhibitory 9.26E-05 -0.00634 0.025484 31.90 Avg Conservation - Inhibitory 30.56 34.34 33.34 Functional activatory sites are more � Functional Kinase P-Sites conserved. Total: 183 0.009931 Avg Activation - Activating 0.006025 0.008227 0.016565 37.67 Avg Conservation - Activating 37.48 40.51 34.86 � Activation Scores Avg Activation - Inhibitory -0.00357 0.002857 0.025 0.005179 Activatory sites have higher avg. � 32.47 Avg Conservation - Inhibitory 30.29 36.01 33.30 activation score than inhibition sites as Kinase P-Sites we excepted. Total: 7,121 0.001276 Avg Activation 0.000331 0.000408 0.003833 30.33 Avg Conservation 27.62 32.34 33.78 13 1:02 AM
Acknowledgement � CRD grant from the Natural Sciences and Engineering Research Council (NSERC) of Canada and the MITACS Accelerate Internship Program � Kinexus Company, on data preparation 14 1:02 AM
Questions 15 1:02 AM
Recommend
More recommend