local maximal stack scores with general loop penalty
play

Local Maximal Stack Scores with General Loop Penalty Function EVA - PowerPoint PPT Presentation

Local Maximal Stack Scores with General Loop Penalty Function EVA 2005, Gothenburg Niels Richard Hansen . p.1/17 Local Maximal Stack Scores with General Loop Penalty Function EVA 2005, Gothenburg Niels Richard Hansen This talk is based


  1. Local Maximal Stack Scores with General Loop Penalty Function EVA 2005, Gothenburg Niels Richard Hansen . – p.1/17

  2. Local Maximal Stack Scores with General Loop Penalty Function EVA 2005, Gothenburg Niels Richard Hansen This talk is based on two papers: Asymptotics for Local Maximal Stack Scores with General Loop Penalty Function. To be submitted shortly . The Maximum of a Random Walk Reflected at a General Barrier. To appear in Ann. Appl. Probab. . – p.1/17

  3. RNA-structures RNA molecules are sequences of nucleotides – some forming functionally important structures. An RNA-molecule is represented as a sequence, X 1 . . . X n , of letters from the alphabet { A , C , G , U } . . – p.2/17

  4. RNA-structures RNA molecules are sequences of nucleotides – some forming functionally important structures. An RNA-molecule is represented as a sequence, X 1 . . . X n , of letters from the alphabet { A , C , G , U } . Its (secondary) structure is a graph with vertex set { 1 , . . . , n } . . – p.2/17

  5. RNA-structures RNA molecules are sequences of nucleotides – some forming functionally important structures. An RNA-molecule is represented as a sequence, X 1 . . . X n , of letters from the alphabet { A , C , G , U } . Its (secondary) structure is a graph with vertex set { 1 , . . . , n } . The graph is a partial matching: A vertex can enter in at most one edge and no loops. . – p.2/17

  6. RNA-structures RNA molecules are sequences of nucleotides – some forming functionally important structures. An RNA-molecule is represented as a sequence, X 1 . . . X n , of letters from the alphabet { A , C , G , U } . Its (secondary) structure is a graph with vertex set { 1 , . . . , n } . The graph is a partial matching: A vertex can enter in at most one edge and no loops. Typically edges between near neighbours (sharp turns) are not allowed. . – p.2/17

  7. RNA-structures RNA molecules are sequences of nucleotides – some forming functionally important structures. An RNA-molecule is represented as a sequence, X 1 . . . X n , of letters from the alphabet { A , C , G , U } . Its (secondary) structure is a graph with vertex set { 1 , . . . , n } . The graph is a partial matching: A vertex can enter in at most one edge and no loops. Typically edges between near neighbours (sharp turns) are not allowed. Typically pseudo-knots are not allowed: Pairs of edges of the form { i 1 , j 1 } and { i 2 , j 2 } with i 1 < i 2 < j 1 < j 2 are not allowed. . – p.2/17

  8. RNA-structures RNA molecules are sequences of nucleotides – some forming functionally important structures. An RNA-molecule is represented as a sequence, X 1 . . . X n , of letters from the alphabet { A , C , G , U } . Its (secondary) structure is a graph with vertex set { 1 , . . . , n } . The graph is a partial matching: A vertex can enter in at most one edge and no loops. Typically edges between near neighbours (sharp turns) are not allowed. Typically pseudo-knots are not allowed: Pairs of edges of the form { i 1 , j 1 } and { i 2 , j 2 } with i 1 < i 2 < j 1 < j 2 are not allowed. An edge represents a hydrogen bond between nucleotides. . – p.2/17

  9. RNA-structures GUG UA AG C GC AU ACCG CCG CUGCAUACUUC UUACAU CCAUA CUAU C |||| ||| ||||||||||| |||||| ||||| |||| A UGGU GGC GAUGUAUGAAG AAUGUA GGUAU GGUA U UGA GG AA A A A AA An example RNA-molecule from the nematode C. elegans . . – p.3/17

  10. RNA-structures GUG UA AG C GC AU ACCG CCG CUGCAUACUUC UUACAU CCAUA CUAU C |||| ||| ||||||||||| |||||| ||||| |||| A UGGU GGC GAUGUAUGAAG AAUGUA GGUAU GGUA U UGA GG AA A A A AA An example RNA-molecule from the nematode C. elegans . Xiong and Waterman (1997) show strong limit results for the maximum of (minus) the free energy score of RNA-structures. The free energy score being an additive score of the hydrogen bonded nucleotides (edges) plus linear penalties on the length of the loops (unpaired vertices). The score depends on a parameter vector α . . – p.3/17

  11. Strong Limits Let X 1 , . . . , X n be an iid RNA-sequence. Let T i,j denote the maximal structure score for X i , . . . , X j for i < j and M n = max { max 1 ≤ i<j ≤ n T i,j , 0 } . . – p.4/17

  12. Strong Limits Let X 1 , . . . , X n be an iid RNA-sequence. Let T i,j denote the maximal structure score for X i , . . . , X j for i < j and M n = max { max 1 ≤ i<j ≤ n T i,j , 0 } . Relying on subadditive techniques Xiong and Waterman show that 1 lim nT 1 ,n = a ( α ) a.s. n →∞ . – p.4/17

  13. Strong Limits Let X 1 , . . . , X n be an iid RNA-sequence. Let T i,j denote the maximal structure score for X i , . . . , X j for i < j and M n = max { max 1 ≤ i<j ≤ n T i,j , 0 } . Relying on subadditive techniques Xiong and Waterman show that 1 lim nT 1 ,n = a ( α ) a.s. n →∞ If a ( α ) > 0 , 1 lim nM n = a ( α ) a.s. n →∞ and if a ( α ) < 0 1 lim log nM n = b ( α ) a.s. n →∞ . – p.4/17

  14. A Conjecture In the logarithmic phase, a ( α ) < 0 , Xiong and Waterman conjecture that P ( M n > t ) ≃ 1 − exp( − K ( α ) n exp( − t/b ( α ))) (1) for suitable large n and t . . – p.5/17

  15. A Conjecture In the logarithmic phase, a ( α ) < 0 , Xiong and Waterman conjecture that P ( M n > t ) ≃ 1 − exp( − K ( α ) n exp( − t/b ( α ))) (1) for suitable large n and t . For a (quite restrictive) class of stack/hairpin-loop structures we show such a result. Our result contains situations corresponding to a ( α ) = 0 but where (1) holds. . – p.5/17

  16. | {z } | {z } | {z } Local scores We proceed as follows: Choose functions f : { A , C , G , U } 2 → R (non-lattice) and g : N 0 → ( −∞ , 0] . . – p.6/17

  17. Local scores We proceed as follows: Choose functions f : { A , C , G , U } 2 → R (non-lattice) and g : N 0 → ( −∞ , 0] . For 1 ≤ i < j ≤ n define � δ � � T i,j = max f ( X i + k , X j − k ) + g ( j − i − 2 δ − 1) . | {z } | {z } | {z } − 2 ≤ 2 δ<j − i k =0 stack hairpin-loop stack X 1 . . . X i − 1 X i . . . X i + δ X i + δ +1 . . . X j − δ − 1 X j − δ . . . X j X j +1 . . . X n . δ +1 j − i − 2 δ − 1 δ +1 . – p.6/17

  18. Local scores We proceed as follows: Choose functions f : { A , C , G , U } 2 → R (non-lattice) and g : N 0 → ( −∞ , 0] . For 1 ≤ i < j ≤ n define � δ � � T i,j = max f ( X i + k , X j − k ) + g ( j − i − 2 δ − 1) . | {z } | {z } | {z } − 2 ≤ 2 δ<j − i k =0 stack hairpin-loop stack X 1 . . . X i − 1 X i . . . X i + δ X i + δ +1 . . . X j − δ − 1 X j − δ . . . X j X j +1 . . . X n . δ +1 j − i − 2 δ − 1 δ +1 Let M n = max 1 ≤ i<j ≤ n T i,j . . – p.6/17

  19. The Recursion The scores T i,j fulfill the recursion T i,j = max { T i +1 ,j − 1 + f ( X i , X j ) , g ( j − i + 1) } . X 1 X 2 X 3 X 4 X 5 X 1 g (1) X 2 0 g (1) X 3 0 g (1) X 4 0 g (1) X 5 0 g (1) . – p.7/17

  20. The Recursion The scores T i,j fulfill the recursion T i,j = max { T i +1 ,j − 1 + f ( X i , X j ) , g ( j − i + 1) } . X 1 X 2 X 3 X 4 X 5 X 1 g (1) T 1 , 2 ր X 2 0 g (1) X 3 0 g (1) X 4 0 g (1) X 5 0 g (1) . – p.7/17

  21. The Recursion The scores T i,j fulfill the recursion T i,j = max { T i +1 ,j − 1 + f ( X i , X j ) , g ( j − i + 1) } . X 1 X 2 X 3 X 4 X 5 X 1 g (1) T 1 , 2 T 1 , 3 ր ր X 2 0 g (1) X 3 0 g (1) X 4 0 g (1) X 5 0 g (1) . – p.7/17

  22. The Recursion The scores T i,j fulfill the recursion T i,j = max { T i +1 ,j − 1 + f ( X i , X j ) , g ( j − i + 1) } . X 1 X 2 X 3 X 4 X 5 X 1 g (1) T 1 , 2 T 1 , 3 T 1 , 4 ր ր ր X 2 0 g (1) T 2 , 3 ր X 3 0 g (1) X 4 0 g (1) X 5 0 g (1) . – p.7/17

  23. The Recursion The scores T i,j fulfill the recursion T i,j = max { T i +1 ,j − 1 + f ( X i , X j ) , g ( j − i + 1) } . X 1 X 2 X 3 X 4 X 5 X 1 g (1) T 1 , 2 T 1 , 3 T 1 , 4 T 1 , 5 ր ր ր ր X 2 0 g (1) T 2 , 3 T 2 , 4 T 2 , 5 ր ր ր X 3 0 g (1) T 3 , 4 T 3 , 5 ր ր X 4 0 g (1) T 4 , 5 ր X 5 0 g (1) . – p.7/17

  24. The Diagonals Suppose ( X k ) k ∈ Z is a doubly infinite sequence of iid variables. Define recursively T 0 k = max { T 1 T 0 k − 1 + f ( X − k , X k ) , g (2 k ) } , 0 = 0 and T 1 k = max { T 2 T 1 k − 1 + f ( X − k , X k ) , g (2 k + 1) } , 0 = g (1) . . – p.8/17

  25. The Diagonals Suppose ( X k ) k ∈ Z is a doubly infinite sequence of iid variables. Define recursively T 0 k = max { T 1 T 0 k − 1 + f ( X − k , X k ) , g (2 k ) } , 0 = 0 and T 1 k = max { T 2 T 1 k − 1 + f ( X − k , X k ) , g (2 k + 1) } , 0 = g (1) .  T 0 if j − i is odd  D ( j − i +1) / 2 T i,j = T 1 if j − i is even  ( j − i ) / 2 . – p.8/17

  26. Reflected Random Walks The processes ( T i k ) k ≥ 0 , i = 0 , 1 are random walks reflected at g . 150 150 150 100 100 100 50 50 50 0 0 0 −50 −50 −50 −100 −100 −100 g(n)=0 −150 −150 −150 g(n) = −15 log(n) g(n) = −n −200 −200 −200 0 0 0 50 50 50 100 100 100 150 150 150 200 200 200 . – p.9/17

  27. Reflected Random Walks If M i := sup k ≥ 0 T i k < ∞ a.s. and θ ∗ > 0 solves E exp( θf ( X − 1 , X 1 )) = 1 . then P ( M i > x ) ∼ K ∗ i exp( − θ ∗ x ) for x → ∞ . . – p.10/17

Recommend


More recommend