Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space Pablo Gainza CPS 296: Topics in Computational Structural Biology Department of Computer Science Duke University 1
Outline 1) Problem definition 2) Formulation as an inference problem 3) Graphical Models 4) tBMMF algorithm 5) Results 6) Conclusions 2
1. Problem Definition Rotamer Library Positions to design & allowed Rotamers/Amino Acids Energy f(x) Protein structure Protein design algorithm www.cs.duke.edu/donaldlab GMEC 2 S 3
1. Problem Definition (2) r ! Rotamer assignment (RA) 1 2 r 1 r i ! Rotamer at position i for RA r 4
1. Problem Definition (3) E i ( r i ) ! Energy between rotamer r i and ¯xed backbone E ij ( r i ; r j ) ! Energy between rotamers r i and r j ) E ij ( r 1 ; r 2 ) 1 r ( E i 1 2 E ( r ) ! Energy of rotamer assignment r X X E ( r ) = E i ( r i ) + E ij ( r i ; r j ) i i;j 5
1. Problem Definition (4) T(k) ! returns amino acid type of rotamer k T(r) ! returns sequence of rotamer assignment r r 1 2 T(r 1 ) = hexagon T(r 2 ) = cross T(r) = hexagon ; cross 6
1. Problem Definition (5) Rotamer Library Positions to design & allowed Rotamers/Amino Acids Energy f(x) Protein structure Protein design algorithm S ¤ = T(arg min E ( r )) www.cs.duke.edu/donaldlab GMEC 2 S r 7
1. Problem Definition (6) Related Work BWM DEE / A* BroMAP Exact Methods Global Minimum Energy Conformation Low energy conformation Probabilistic Methods SCMF MCSA 8
1. Problem Definition (7) Rotamer Library Positions to design & allowed Rotamers/Amino Acids Energy f(x) Protein structure Model Inaccurate! www.cs.duke.edu/donaldlab 9
1. Problem Definition (8) Protein design algorithm Algorithm S ¤ = T(arg min E ( r )) r Fast or provable 10
1. Problem Definition (9) Low Not fold to binding target specificity Too stable Low energy conformation 11
1. Problem Definition (10) DEE/A* Provable Methods Ordered set of gap-free low energy conformations, including GMEC Solution: Find a set of low energy sequences Set of low energy conformations Probabilistic Methods tBMMF 12
Problem Definition: Summary ● Protein design algorithms search for the sequence with the Global Minimum Energy Conformation ( GMEC ). ● Our model is inaccurate : more than one low energy sequence is desirable. ● Fromer et al. Propose tBMMF to generate a set of low energy sequences. 13
2. Our problem as an inference problem Probabilistic factor for self-interactions ¡ Ei ( ri ) Ã i ( r i ) = e T Probabilistic factor for pairwise interactions ¡ Eij ( ri;rj ) Ã ij ( r i ; r j ) = e T 14
2. Inference problem (2) Partition function X E ( r ) Z = e T r Probability distribution for rotamer assignment r Y Y P ( r 1 ; :::; r N ) = 1 Ã ij ( r i ; r j ) = 1 ¡ E ( r ) Ã i ( r i ) Z e T Z i i;j 15
2. Inference problem (3) Minimization goal (from definition ) S ¤ = T(arg min E ( r )) r Minimization goal for a graphical model problem S ¤ = T(arg max Pr ( r )) r 16
Example: Inference problem 2. Inference problem (4) E ij ( r 1 ; r 2 ) Allowed Position #1 1 2 Position #2 r 0 -4 -2 r 00 E i ( r 1 ) E i ( r 2 ) E ( r 0 ) =? -5 -1 -3 E ( r 00 ) =? What is our GMEC?? 17
2. Inference problem (5) E ij ( r 1 ; r 2 ) Allowed Position #1 1 2 Position #2 r 0 -4 -2 r 00 E i ( r 1 ) E i ( r 2 ) E ( r 0 ) = ( ¡ 1 + ¡ 2) + ( ¡ 5 + ¡ 2) -5 -1 = ¡ 10 -3 E ( r 00 ) = ( ¡ 1 + ¡ 4) + ( ¡ 3 + ¡ 4) = ¡ 12 r 00 is our GMEC 18
2. Inference problem (6) E ij ( r 1 ; r 2 ) Allowed Position #1 1 2 Position #2 r 0 -4 -2 r 00 E i ( r 1 ) E i ( r 2 ) ¡ Ei ( r 0 1) Ã i ( r 0 1 ) = e = e -5 T -1 ¡ Ei ( r 0 2) -3 Ã i ( r 0 = e 5 2 ) = e T ¡ Ei ( r 00 1 ) Ã i ( r 00 1 ) = e = e T ¡ Ei ( r 00 2 ) Ã i ( r 00 = e 3 2 ) = e T = 1 (for our example) T 19
2. Inference problem (7) E ij ( r 1 ; r 2 ) Allowed Position #1 1 2 Position #2 r 0 -4 -2 r 00 E i ( r 1 ) E i ( r 2 ) ¡ Eij ( r 0 1 ;r 0 2) Ã ij ( r 0 1 ; r 0 = e 2 -5 2 ) = e -1 T ¡ Eij ( r 00 1 ;r 00 -3 2 ) Ã ij ( r 00 1 ; r 00 = e 4 2 ) = e T X E ( r ) = e 10 + e 12 Z = e T r T = 1 (for our example) 20
2. Inference problem (8) E ij ( r 1 ; r 2 ) Allowed Position #1 1 2 Position #2 r 0 -4 -2 r 00 E i ( r 1 ) E i ( r 2 ) Y Y 2 ) = 1 -5 P ( r 0 1 ; r 0 Ã i ( r 0 Ã ij ( r 0 i ; r 0 -1 i ) j ) Z i i;j -3 e 10 = e 10 + e 12 T = 1 (for our example) 21
2. Inference problem (9) E ij ( r 1 ; r 2 ) Allowed Position #1 1 2 Position #2 r 0 -4 -2 r 00 E i ( r 1 ) E i ( r 2 ) Y Y 2 ) = 1 -5 P ( r 00 1 ; r 00 Ã i ( r 0 Ã ij ( r 00 i ; r 00 -1 i ) j ) Z i i;j -3 e 12 = e 10 + e 12 T = 1 (for our example) 22
2. Inference problem (10) E ij ( r 1 ; r 2 ) Allowed Position #1 1 2 Position #2 r 0 -4 -2 r 00 E i ( r 1 ) E i ( r 2 ) S ¤ = T(arg max -5 Pr ( r )) -1 r -3 S ¤ = T( r 00 ) T = 1 (for our example) 23
2. Inference problem (11) Minimization goal (from definition ) S ¤ = T(arg min E ( r )) r Minimization goal for a graphical model problem S ¤ = T(arg max Pr ( r )) r We still have a non-polynomial problem! But formulated as an Probabilistic inference problem methods 24
Summary: Inference problem ● We model our problem as an inference problem. ● We can use probabilistic methods to solve it. 25
3. Graphical models for protein design and belief propagation (BP) 2. Build interaction graph that shows 1. Model each design conditional position as a random variable independence between variables Source: Fromer M, Yanover, C. Proteins (2008) SspB dimer interface: Inter-monomeric interactions (Cα) 26
Example: Belief propagation 3. Graphical Models/BP (2) 2 r 0 2 3 node in the graphical r 00 3 3 model: interacting 3 residue in the structure. 1 1 node in the graphical model: random variable 27
Example: Belief propagation 3. Graphical Models/BP (3) edge: energy interaction between 2 r 0 two residues. 2 3 r 00 3 b 3 edge: causal 3 relationship between two nodes 1 1 If two residues are 4 4 distant from each other, no edge between them. 28
Example: Belief propagation 3. Graphical Models/BP (4) r 0 2 r 0 2 r 0 2 r 0 3 2 3 r 00 3 3 3 r 00 r 0 1 3 1 1 r 0 1 Every random variable can be in one of several states: allowable rotamers for that position 29
Example: Belief propagation 3. Graphical Models/BP (5) r 0 2 r 0 2 r 0 2 r 0 3 2 3 r 00 3 3 3 r 00 r 0 1 3 1 1 r 0 1 The energy of each state depends on: - its singleton energy - its pairwise energies - the energies of the states of its 30 parents
Example: Belief propagation 3. Graphical Models/BP (6) m 2 ! 3 ( r 0 3 ) m 2 ! 3 ( r 00 3 ) Belief propagation: each node tells its neighbors nodes what it r 0 believes their state should be 2 r 0 2 ! 3 2 3 m A message is sent from node i to 3 node j r 00 1 3 The message is a vector where r 0 # of dimensions: allowed 1 states/rotamers in recipient 31
Example: Belief propagation 3. Graphical Models/BP (7) Who sends the first message? r 0 2 r 0 2 3 3 r 00 1 3 r 0 1 32
Example: Belief propagation 3. Graphical Models/BP (8) Who sends the first message? r 0 2 r 0 2 3 In a tree : the leaves - Belief propagation is proven to be correct in a tree! 3 r 00 1 3 r 0 1 33
Example: Belief propagation 3. Graphical Models/BP (9) Who sends the first message? r 0 2 r 0 2 3 In a graph with cycles: ● Set initial values 3 ● Send in parallel r 00 1 3 r 0 1 No guarantees can be made! There might not be any convergence 34
Example: Belief propagation 3. Graphical Models/BP (10) m 2 ! 3 ( r 0 3 ) = 1 m 2 ! 3 ( r 00 3 ) = 1 m 3 ! 2 ( r 0 2 ) = 1 r 0 2 r 0 m 1 ! 2 ( r 0 2 ) = 1 3 2 2 ! 3 m We iterate from 3 ! 2 m m 1 ! 2 m 2 ! 1 there. 3 m 1 ! m m 2 ! 1 ( r 0 3 1 ) = 1 3 ! r 00 1 1 3 r 0 1 m 3 ! 1 ( r 0 1 ) = 1 m 1 ! 3 ( r 0 3 ) = 1 m 1 ! 3 ( r 0 0 3 ) = 1 35
Example: Belief propagation 3. Graphical Models/BP (11) m 2 ! 3 ( r 0 3 ) = 1 m 2 ! 3 ( r 0 0 3 ) = 1 Node 3 receives r 0 messages from nodes 1 2 and 2 r 0 3 2 2 ! 3 m 3 m 1 r 00 ! 3 1 3 r 0 1 m 1 ! 3 ( r 0 3 ) = 1 m 1 ! 3 ( r 0 0 3 ) = 1 36
Example: Belief propagation 3. Graphical Models/BP (12) What message does node r 0 3 send to node 1 on the 2 next iteration? r 0 3 2 3 m 3 ! 1 r 00 1 3 r 0 1 m 3 ! 1 ( r 0 1 ) =? 37
Belief propagation: message passing N ( i ) ! Neighbors of variable i Message that gets sent on each iteration Y ¡ Ei ( ri ) ¡ Eij ( ri;rj ) r i : e m k ! i ( r i ) ; m i ! j ( r j ) = max t k 2 N ( i ) n j 38
Example: Belief propagation Pairwise energies Singleton energies E ij ( r 1 ; r 2 ) E i ( r 3 ) E i ( r 2 ) E i ( r 1 ) Position #1 r 0 -6 Position #2 -2 3 -1 -4 r 00 -2 3 E ij ( r 2 ; r 3 ) Iteration 0: Position #3 ¡ Ei ( r 3) ¡ Eij ( r 0 3 ;r 1) m 3 ! 1 ( r 0 m 2 ! 3 ( r 0 r 3 : e 1 ) = max 3 ) ; t Position #2 -1 -3 ¡ Ei ( r 3) ¡ Eij ( r 00 r 0 r 00 3 ;r 1) m 2 ! 3 ( r 00 3 ) ; 3 e 3 t E ij ( r 1 ; r 3 ) Position #3 =? Position #1 -1 -4 39 r 0 r 00 3 3
Recommend
More recommend