Prediction of Human Protein Kinase Substrate Specificities Javad Safaei 1 , Jan Manuch 1 , Arvind Gupta 1 , Ladislav Stacho 2 , Steven Pelech 3 1. UBC, Department of Computer Science 2. SFU, Department of Mathematics 3. UBC, Department of Medicine, and Kinexus Bioinformatics Corporation
Cell Signaling Network � Human body consists of different types of cells � 23,000 different protein types in cells � Different cell types are different in the level of each protein type � Defects in the cell signaling network leads to 400 diseases (esp. Cancer, Diabetes, and Alzheimer) � Modeling the network is useful for drug discovery
Major components in cell phosphorylation signaling � Each component participate in interaction via its domains (d 1 ,d 2 ) � Phosphorylation creates dramatic changes in 3D structure of proteins leading to inhibition, stimulation of proteins � Kinases are S, S-T, Y specific based on their phosphorylation
Dynamics of Kinase-Substrate Interaction (Docking) � Kinase-Phospho site interaction is a kind of key Protein and lock model Substrate H - � Active sites should be + close to each other for interaction - H - + � Important factors in bond + � Size and position of + amino acids + � Charge of amino acids Protein Kinase
cAMP-dependent Protein Kinase Structure (PKA) � Alanine as base 0 position � Isolecine as +1 position � Argenine as -2 position � Phospho-S Peptide GRTGRRNSIHPDSAC +1 I Sub-domains are shown in color � -2 R L198 SDRs are the key residues helpful for � E170 P202 specificity prediction P169 L205 E230
Problem and Dataset nature � Peptides are found usually in vitro by mass-spectrometry � Peptide is a small sub-sequence with length 15 centered at phospho-site (S, T, Y) �. � Kinases with a lot of peptides � � with a few peptides � with no peptide � Problem is to find PSSM matrix (kinase specificity) of all kinases having only primary structure
Alignments of catalytic domains � Done by ClustalW tool � Purified Manually by experts � Each column is a random variables (RV) � We can now infer how the dynamics of the binding will be
Charge Matrix R(x i ,y j ) � Glycine is favoured to be on the peptide � Histedine is less positive than the others � S, T, Y are neutral but tend to attract each other � Proline is neutral and creates stair like structure on the protein
Graphical model of the interaction X 1 � Mutual Information X 2 Y 1 X 3 Y 2 X 4 Y 3 � Charge Dependecy (n is # of �... �... training data for each RV) X 245 Y 15 X 246 X 247 � Correlation Charge Dependecy
Graphical model of the interaction � Pick top 7, X variables as SDRs Z 1 C c (Z 1 ,Y 1 ) � Compute the probability of Z 2 each amino acid on the peptide C ( Z c , Y 2 ) 1 Z 3 C c (Z 3 ,Y 1 ) Y 1 ... �... ) Y 1 , � Having trained the model, for a Z 7 ( C c new kinase aligned catalytic domain we can predict the Z 7 specificity matrix, knowing only SDRs
Compute profile matrix of a kinase without peptide data � Having trained the model, for a new aligned kinase catalytic domain we can predict the profile matrix, knowing only SDRs
Data and Process Flow Phospho.ELM Literature PhosphoSite Plus Maching Learning ANN, SVM, HMM 9,125 Kinase-Phospho Compute Background (Surface) 550 Kinases in human Peptide pairs for 309 Kinase Frequency of Amino Acids domains Remove atypical kinases 500 Kinase catalytic domain 229 Kinases with consensus Compute Specificity (PSSM) Matrices sequences from 488 Kinases Of 309 Kinases domains Compute Profile Matrix of 309 Find SDRs and Profile Matrix of 500 Kinases domains with data Kinases with No Data Comparison in Experiment Compute PSSM Matrices for 500 domains Comparison in NetPhorest Predictor Experiment Sites
Definitions � Background Frequency B(i) , probability of amino acid i on the surface, we compute it by peptide training data � Profile Matrix of each Kinase, P k (i,j) amino acid i at position j of the peptides phosphorylated by Kinase K. � Specificity (PSSM) Matrix of a Kinase usually is log odds ratio M k (i,j)= log(P k (i,j) / B(i)) � We used the following eq. to eliminate –inf in the matrix M k (i,j)= sgn{P k (i,j) – B(i)}× |P k (i,j) – B(i)| 1.2
Predicted vs. Experimental profile matrices � Comparison for 309 Kinases that we have phospho-peptide data � Prediction was 100% correct to recognize (S,S-T,Y) specific kinases, using only their aligned SDRs
Comparison with Netphorest Netphorest has � 8,746 phosphosite-kinase � 169 Kinases � 50 Kinase groups � Doesn’t work for kinases with � no data Keeping the best kinase for each � site leads to 6299 site-kinase for comparison Our Method � Works for all 500 kinases � 500 different profile matrices � and specificities SDRs and yielding information � about 3D structure
Future work (1) � hybrid recommender User / Kinase 1 u 1 systems for prediction Movie/ Peptide 1 u 5 u 2 Movie/ Peptide 2 User / Kinase 2 � Sparse utility matrix u 4 should be completed Movie/ Peptide 3 ? � SDRs therefore are important features in Movie/ Peptide M User / Kinase N user spec vector 3 S N×N U N×M Q M×M Similarity Similarity Utility Matrix Matrix Matrix
Future work (2) � Generalize it for SH2, PTB domain proteins, to complete our model of cell signalling pathway � We have many crystallographic datasets here from PDB, and computational geometry or vision methods can be applied � Like user-movie problem, there is signal strength between SH2 domain proteins and receptor (substrate) proteins.
Recommend
More recommend