protein folding simulation in concurrent constraint
play

Protein Folding Simulation in Concurrent Constraint Programming - PowerPoint PPT Presentation

Protein Folding Simulation in Concurrent Constraint Programming Luca Bortolussi, Alessandro Dal Pal` u, Agostino Dovier DIMI, Univ. of Udine (IT) Federico Fogolari DST, Univ. of Verona (IT) Outline of the talk Introduction Concurrent


  1. Protein Folding Simulation in Concurrent Constraint Programming Luca Bortolussi, Alessandro Dal Pal` u, Agostino Dovier DIMI, Univ. of Udine (IT) Federico Fogolari DST, Univ. of Verona (IT)

  2. Outline of the talk • Introduction • Concurrent framework • Testing model • Results • Future Work L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 2/20

  3. Proteins Proteins are abundant in nature and fundamental to life. • • The diversity of 3D protein structure underlies the very large range of their function (Enzymes, Storage, Transport, Mes- sengers, Antibodies, Regulation, mechanical support). • A Protein is a polymer chain made of monomers ( aminoacids ) of 20 different kinds. Aminoacids have a common part (6 atoms) and a distinguish- • ing part (from 1 to 18 atoms). • They are typically identified by one letter in { A, . . . , Z }\{ B, J, O, U, X, Z } . L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 3/20

  4. Proteins The Primary Structure is the sequence of aminoacids consti- • tuting a protein. • The Secondary Structures of a Protein are local structures ( α - helices , β -sheets ) which formation is caused by local forces. • The Tertiary Structure , that determines macroscopic proper- ties and biological functions, is the 3D conformation of the Protein. L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 4/20

  5. Example: Protein 1ENH Primary Structure: • R,P,R,T,A,F,S,S,E,Q, L,A,R,L,K,R,E,F,N,E, N,R,Y,L,T,E,R,R,R,Q, Q,L,S,S,E,L,G,L,N,E, A,Q,I,K,I,W,F,Q,N,K, R,A,K,I • Tertiary Structure: All atom Model / Simplified Model L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 5/20

  6. The Protein Structure Prediction Problem • Proteins fold in a determined environment (e.g. water) to form a very specific geometric pattern ( native state/conformation ). • The native conformation is relatively stable and unique, and corresponds to a state which minimizes the global free energy. • The Protein Structure Prediction problem (PSP) consists in pre- dicting the Tertiary Structure of a protein, given its Primary Structure. L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 6/20

  7. Approaches to PSP Several different approaches have been used to tackle PSP: • Homology modelling and folding recognition; • All atoms simulation using molecular dynamics; • Constraint-based approaches in lattices; • Ab-initio simulations using simplified models. L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 7/20

  8. CCP simulation framework • We encoded the PSP problem into a Concurrent Constraint (Logic) Programming paradigm. • Each aminoacid is associated to an independent process. • Each process communicates with the others, and reacts to their changes of the spatial position. • The framework is independent from the spatial model of the protein and from the energy model. L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 8/20

  9. Communication Strategy • Each process, before performing a move, waits for the commu- nication of a movement of some other aminoacid; • The information of position changes of process P i is stored in a list L i of logic terms (leaving the tail variable uninstantiated), thus keeping track of the entire history of the folding known to him; • Each process, while moving, uses the most recent information available to him, i.e. the last ground terms of the lists L i ; • each process, once it has moved, communicates to all other processes its new position updating its list. L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 9/20

  10. Abstract CCP program 7 run(ID, S, [P1, P2, ..., Pn]):- 1 simulation(S):- 8 getTails([P1, ..., Pn],[T1, ..., Tn]), 2 Init=[[I1|_], 9 ask(T1=[_|_]) -> skip + [I2|_], 10 ask(T2=[_|_]) -> skip + ..., 11 ... + [In|_]], 12 ask(Tn=[_|_]) -> skip, 3 run(1,S,Init) || 13 getLast([P1, ..., Pn],[L1, ..., Ln]), 4 run(2,S,Init) || 14 updatePosition(ID,S,[L1,..,Ln],NP), 5 ... || 15 tell(TID=[NP|_]), 6 run(n,S,Init). 16 run(ID,S,[P1, ..., Pn]). • The main clause is simulation . • Init is a variable containing n lists which contain the initial positions Ii . • n concurrent calls to run are called, one for each process - aminoacid. L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 10/20

  11. Abstract CCP program 7 run(ID, S, [P1, P2, ..., Pn]):- 1 simulation(S):- 8 getTails([P1, ..., Pn],[T1, ..., Tn]), 2 Init=[[I1|_], 9 ask(T1=[_|_]) -> skip + [I2|_], 10 ask(T2=[_|_]) -> skip + ..., 11 ... + [In|_]], 12 ask(Tn=[_|_]) -> skip, 3 run(1,S,Init) || 13 getLast([P1, ..., Pn],[L1, ..., Ln]), 4 run(2,S,Init) || 14 updatePosition(ID,S,[L1,..,Ln],NP), 5 ... || 15 tell(TID=[NP|_]), 6 run(n,S,Init). 16 run(ID,S,[P1, ..., Pn]). • ID is the identification code of the aminoacid. • getTails gets the tails of the lists P1,...,Pn and assigns them to the variables T1,...,Tn . • Then the process waits for one of these variables to be instan- tiated with ask(Ti=[_|_]) -> skip . L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 11/20

  12. Abstract CCP program 7 run(ID, S, [P1, P2, ..., Pn]):- 1 simulation(S):- 8 getTails([P1, ..., Pn],[T1, ..., Tn]), 2 Init=[[I1|_], 9 ask(T1=[_|_]) -> skip + [I2|_], 10 ask(T2=[_|_]) -> skip + ..., 11 ... + [In|_]], 12 ask(Tn=[_|_]) -> skip, 3 run(1,S,Init) || 13 getLast([P1, ..., Pn],[L1, ..., Ln]), 4 run(2,S,Init) || 14 updatePosition(ID,S,[L1,..,Ln],NP), 5 ... || 15 tell(TID=[NP|_]), 6 run(n,S,Init). 16 run(ID,S,[P1, ..., Pn]). • Once this happens it retrieves the last information with getLast . • Then it updates its position with updatePosition and communi- cates its move to all other processes by means of tell . • Finally the run procedure is called recursively. L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 12/20

  13. Movement Strategy The procedure updatePosition works in the following way: • The aminoacid randomly chooses a new position, close to the current one within a given step; • Using the most recent information available about the spatial position of other processes, it computes the energy relative to the choice; • It accepts the position using a Montecarlo criterion: - If the new energy is lower than the current one, it accepts the move; - If the new energy is greater than the current one, it accepts the move with probability e − Enew − Ecurrent . T • This procedure depends on the spatial model adopted. L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 13/20

  14. Movement Strategy The new position is randomly selected in the following way: • We calculate the set of points which keep fixed the distance with the adjacent neighbours of the aminoacid (a circumference or a sphere); • We randomly select a point in this set, close to the current position; • We randomly select a small offset from this point. L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 14/20

  15. Testing Model ✬ ✩ side chain ⑦ Cα ❳ C ′ ✘ ❳❳ ❳❳ ✿ ✘ ✘ N ✘✘ ✘ ③ ❳ H ✫ ✪ H O • Each aminoacid is represented as a single center of interaction, which corresponds to the C α atom. • The energy function consists of four terms, which take into account local and global interactions. • This model is very simple, but served as a test for the framework. L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 15/20

  16. Energy Function The Energy Function we use is E ( � s ) = η b E b ( � s ) + η a E a ( � s ) + η t E t ( � s ) + η c E c ( � s ) E b ( � s ) is the Bond Distance term � 2 � � E b ( � s ) = r ( s i , s i +1 ) − r 0 1 ≤ i ≤ n − 1 E a ( � s ) is the Bond Angle Bend term 1 0.8 � 2 � 2   n − 2 � βi − β 1 � βi − β 2 0.6 − − � σ 1 σ 2 0.4 E a ( � s ) = − log + a 2 e  a 1 e  0.2 i =1 0 0 1 2 3 Radians L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 16/20

  17. Energy Function The Energy Function we use is E ( � s ) = η b E b ( � s ) + η a E a ( � s ) + η t E t ( � s ) + η c E c ( � s ) E t ( � s ) is the Torsional Angle term  (Φ i − φ 1)2 (Φ i − φ 2)2  n − 3 ( σ 1+ σ 0)2 + a 2 e ( σ 2+ σ 0)2 � E t ( � s ) = − log  a 1 e    i =1 E c ( � s ) is the Contact Interaction term � 12 � 6   n − 3 n � r 0 ( s i , s j ) � r 0 ( s i , s j ) � � E c ( � s ) =  | Pot ( s i , s j ) | + Pot ( s i , s j )  r ( s i , s j ) r ( s i , s j ) i =1 j = i +3 1 0.8 Potential 0.6 0 0.4 0.2 0 0 r 1r 2 r 3 r 0 2 4 6 Radians L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 17/20

  18. Implementation The code is implemented in Mozart. There are two classes: • Protein , implements the simulation predicate and coordinates the process associated with the single aminoacids; • Amino , which describes the single aminoacid, and implements all the methods related to the action, like updatePosition , ask and tell . L. Bortolussi, A. Dal Pal` u, A. Dovier, and F. Fogolari BIOCONCUR 2004 — 18/20

Recommend


More recommend