Dynamics of Protein-Protein Interactions: A Probabilistic Model Toward Protein Function Amir Vajdi Computer Science Department University of Massachusetts Boston PhD Dissertation Defense, November 28, 2018 Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 1 / 54
Committee Members Prof. Nurit Haspel (Advisor) Prof. Kourosh Zarringhalam (Mathematics Department) Prof. Dan Simovici Prof. Ming Ouyang Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 2 / 54
My research projects Clustering co-expressed genes using time series data (IEEE BIBM 2015) Chromosomal structural variation detection using Jaccard distance (IEEE BIBM 2017) Computational biomarker discovery for cancer data based on RNA-Seq profiles(2017) Identifying significant TFs in Toxoplasma gondii cell cycle (2018-now) Human Gait Database (2017-now) Learning structural information as a penalty for Protein-Protein interface prediction (2017-2018) Simulation of protein trajectory between open and closed conformations using Monte Carlo tree search method (2016-2017) Clustering protein conformations changes (BICOB 2016) Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 3 / 54
Biology Background 1 Protein Structure Protein Function Protein-Protein Interaction Interface Prediction 2 Research Problem and Related Work Probabilistic Graphical Model Our New Proposed Method Simulating Trajectories of Conformational Changes in Proteins and 3 Identifying Intermediate Clusters Research Problem Monte Carlo Tree Search Method for Simulation of Conformational Changes Clustering Coformational Changes using Geometric-Based Distance Function Questions and Answers 4 Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 4 / 54
Biology Background 1 Protein Structure Protein Function Protein-Protein Interaction Interface Prediction 2 Research Problem and Related Work Probabilistic Graphical Model Our New Proposed Method Simulating Trajectories of Conformational Changes in Proteins and 3 Identifying Intermediate Clusters Research Problem Monte Carlo Tree Search Method for Simulation of Conformational Changes Clustering Coformational Changes using Geometric-Based Distance Function Questions and Answers 4 Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 5 / 54
Central Dogma of molecular biology Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 6 / 54
Molecular structure of an Amino Acid Every Amino Acid has Amino group, C- α , and Carboxyl group Amino Acids are different in side chain Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 7 / 54
Four main representations of a protein Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 8 / 54
Biology Background 1 Protein Structure Protein Function Protein-Protein Interaction Interface Prediction 2 Research Problem and Related Work Probabilistic Graphical Model Our New Proposed Method Simulating Trajectories of Conformational Changes in Proteins and 3 Identifying Intermediate Clusters Research Problem Monte Carlo Tree Search Method for Simulation of Conformational Changes Clustering Coformational Changes using Geometric-Based Distance Function Questions and Answers 4 Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 9 / 54
Research problem Given two protein A and B, what are residues from protein A interacting with residues from protein B? Two residues are contacting if the distance between them are less than n ˚ A Challenges: Large search space Interface between two proteins is a small fraction of their surface Binding site has a complex behavior and it is vary across different complexes Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 10 / 54
Protein binding site properties Proteins binding site predictive features are: Binding sites are located on surface of the protein (Accessible Surface Area) Amino Acid interaction propensity Stability of native complex Conformational Changes between open and closed structure Formed as a set of patches Conservation of center of patch among homologous proteins Co-evolution of neighbour residues to center of patch among homologous proteins Secondary structure ( α -Helix and β -Sheet) There is no general rule. Protein types behave differently from each other. Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 11 / 54
Amino Acid interaction propensity is different among complex types De, Subhajyoti, et al. ”Interaction preferences across protein-protein interfaces of obligatory and non-obligatory components are different.” BMC Structural Biology (2005) Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 12 / 54
Related work PSICOV Jones, David T., et al. ”PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments.” Bioinformatics (2012) GREMLIN Ovchinnikov, Sergey, et al. ”Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information.” Elife (2014) Meta-PSICOV Jones, David T., et al. ”MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins.” Bioinformatics (2015) ComplexContact (RaptorX) Zeng, Hong, et al. ”ComplexContact: a web server for inter-protein contact prediction using deep learning.” Nucleic acids research (2018).‘ Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 13 / 54
An example of contact map between two proteins Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 14 / 54
Flowchart of our proposed method MSA Structure Geometry Propensity Penalty Matrix Graphical Models Post- processing Interface Patches Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 15 / 54
Probabilistic Graphical Models Li, Yupeng, and Scott A. Jackson. ”Gene network reconstruction by integration of prior biological knowledge.” G3: Genes, Genomes, Genetics (2015) Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 16 / 54
Graphical interpretation X 1 X 2 X 3 X 4 X 5 X 6 X 7 S Y C H M F L X 1 F Y P W A R A X 7 X 2 S Y K H G R Q S Y G H Q F Q F Y N W Q R M X 3 X 6 S Y R H Q R M F Y K W A F L X 4 X 5 F Y R W R F L Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 17 / 54
Gaussian Graphical Model (GGM) Probability density function of sequence X 2 exp ( − 1 � − L − 1 � 2 ( x − µ ) T ( � ) − 1 ( x − µ ) , x ∈ R L 2 ( det f µ, � ( x ) = (2 π ) ) by taking trace inner product from above � θ, 1 2 log (2 π ) + 1 2 log ( det ( θ )) − 1 � − L µ T θ x − 2 xx T � 2 µ T θµ � f µ, � ( x ) = exp where θ = ( � ) − 1 is the inverse of covariacne matrix Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 18 / 54
Objective function of GGM Maximum Likelihood estimation based on S S is empirical (sample) covariance matrix. n n S = 1 X = 1 ( X ( i ) − ¯ X )( X ( i ) − ¯ � ¯ � X ) T X ( i ) where n n i =1 i =1 ) ∝ − n )) − n ) − 1 ) − n � � � 2( ¯ X − µ ) T ( � ) − 1 ( ¯ ℓ L ( µ, 2 log ( det ( 2 tr ( S ( X − µ ) logdet (ˆ θ ) − tr ( S ˆ max θ ) (1) ˆ θ by adding L 1 penalty to above max log ( det θ ) − tr ( S θ ) − Λ || θ || 1 (2) θ Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 19 / 54 Λ is penalty matrix with same size of S.
Blockwise coordinate descent The objective function is solved using Graphical Lasso (GLasso) method by applying coordinate descent approach. � ω 11 � � S 11 � ˆ ω 12 s 12 ˆ ω = , S = ω T s T ˆ ω 22 ˆ s 22 12 12 Where ω 11 , S 11 ∈ R ( L − 1) × ( L − 1) , ˆ ω 12 , ˆ s 12 are vectors of size L − 1, and ω 22 , s 22 are scalars. Start with ω = S + Λ I and update ω iteratively. y { y T ω − 1 ω 12 = min ˆ 11 y : || y − ˆ s 12 || ∞ ≤ Λ } Solution of ˆ ω 12 satisfies the above function is same as the solution of β in the following Lasso problem, since ˆ ω 12 = ω 11 β β { 1 1 − 1 11 β − b || 2 + Λ || β || 1 } , min 2 || ω 2 where b = ω 11 ˆ 2 s 12 FRIEDMAN,J.H.and et all, Sparse inverse covariance estimation with the graphical lasso. Biostatistics (2008) Amir Vajdi (UMB) Protein Function and Dynamic PhD Dissertation Defense 20 / 54
Recommend
More recommend