LinearFold: Linear-Time Approximate RNA Folding by 5’-to-3’ dynamic programming and beam search x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... 1 G C U C C A C G G C 70 76 G C 60 Liang Huang * G C A U G U A U A C U G C U U 10 G A G G G A Oregon State University & Baidu Research USA G C A U C U U C U C G C U 50 U G G A G C G G A U Joint work with He Zhang **, Dezhong Deng **, Kai Zhao, A G G C G 20 Kaibo Liu, David Hendrix and David Mathews G C A U 30 C G 40 C G U A ISMB 2019 Proceedings Talk U A G C C * corresponding author ** equal contribution
LinearFold: Linear-Time Approximate RNA Folding by 5’-to-3’ dynamic programming and beam search x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... 1 G C U C C A C G G C 70 76 G C 60 Liang Huang * G C A U G U A U A C U G C U U 10 G A G G G A Oregon State University & Baidu Research USA G C A U C U U C U C G C U 50 U G G A G C G G A U Joint work with He Zhang **, Dezhong Deng **, Kai Zhao, A G G C G 20 Kaibo Liu, David Hendrix and David Mathews G C A U 30 C G 40 C G U A ISMB 2019 Proceedings Talk U A G C C * corresponding author ** equal contribution
LinearFold: Linear-Time Approximate RNA Folding by 5’-to-3’ dynamic programming and beam search x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... 1 G C U C C A C G G C 70 76 G C 60 Liang Huang * G C A U G U A U A C U G C U U 10 G A G G G A Oregon State University & Baidu Research USA G C A U C U U C U C G C U 50 U G G A G C G G A U Joint work with He Zhang **, Dezhong Deng **, Kai Zhao, A G G C G 20 Kaibo Liu, David Hendrix and David Mathews G C A U 30 C G 40 C G U A first O ( n ) (approx.) RNA folding algorithm ISMB 2019 Proceedings Talk U A G C C & server (linearfold.org) with even higher accuracy than O ( n 3 ) algorithms * corresponding author ** equal contribution
RNA Secondary Structure Prediction allowed pairs: G-C A-U G-U example: transfer RNA (tRNA) assume no crossing pairs (no pseudoknots) input x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA 2 2
RNA Secondary Structure Prediction allowed pairs: G-C A-U G-U example: transfer RNA (tRNA) assume no crossing pairs (no pseudoknots) input x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA output y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... 2 2
RNA Secondary Structure Prediction allowed pairs: G-C A-U G-U example: transfer RNA (tRNA) assume no crossing pairs (no pseudoknots) input x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA output y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... 1 G C U C C A C G G C 70 76 G C 60 G C A U G U A U A C C U U G U 10 G A G A G G C G A U C U C U G C U C U 50 U G C G A G G G A U A G G C G 20 G C A U 30 C 40 G C G U A U A G C C 2 2
RNA Secondary Structure Prediction allowed pairs: G-C A-U G-U example: transfer RNA (tRNA) assume no crossing pairs (no pseudoknots) input x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA output y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... 1 G C U C C A C G G C 70 76 G C 60 G C A U G U A U A C C U U G U 10 G A G A G G C G A U C U C U G C U C U 50 U G C G A G G G A U A G G C G 20 G C A U 30 C 40 G C G parse tree U A U A G C C 2 2
RNA Secondary Structure Prediction allowed pairs: G-C A-U G-U example: transfer RNA (tRNA) assume no crossing pairs (no pseudoknots) input x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA output y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... . S O ( n 3 ) . 1 G C U C C A NP VP C G . G C 70 76 DT NN VB NP G C 60 G C the man bit DT NN A U G U A U A C C U U G U the dog 10 G A G A G G C G A U C U C U G C U C U problem: standard structure prediction 50 U G C G A G G G A U algorithms are way too slow: O ( n 3 ) A G G C G 20 G C A U 30 C 40 G C G parse tree U A U A G C C 2 2
RNA Secondary Structure Prediction allowed pairs: G-C A-U G-U example: transfer RNA (tRNA) assume no crossing pairs (no pseudoknots) input x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA output y (((((((..((((........)))).(((((.......))))).....(((((.......)))))))))))).... . S O ( n 3 ) . 1 G C U C C A NP VP C G . G C 70 76 DT NN VB NP G C 60 G C the man bit DT NN A U G U A U A C C U U G U the dog 10 G A G A G G C G A U C U C U G C U C U problem: standard structure prediction 50 U G C G A G G G A U algorithms are way too slow: O ( n 3 ) A G G C G 20 G C A U 30 C 40 G solution: adapt my linear-time dynamic C G parse tree U A programming algorithms from parsing U A G C C 2 2
Results: LinearFold is Much Faster and More Accurate A 80 10 Standard O ( n 3 ) search Vienna RNAfold: ~ n 2.4 70 LinearFold: O ( n ) search CONTRAfold MFE: ~ n 2.2 Precision * 60 LinearFold-V: ~ n 1.2 8 running time (seconds) LinearFold-C: ~ n 1.1 50 existing ones * * 40 6 80 t 5 S R t G t 1 2 R m e S 6 3 R N r l N o R o S S P a r m u A Standard O ( n 3 ) search R N s r r p R R e e N A r P I N N A a I A A n s 70 LinearFold: O ( n ) search e t r o R n N A 4 Recall 60 * * 50 our work * 2 * 40 ** t 5 S R t G t 1 2 R m e S 6 3 R N r l N o R o S S P a r m u A R N s r r p R R e e N A r P I 0 N N A a I A A n s e t r o R n 0 1000 nt 2000 nt 3000 nt N A 3 3 C
From Linguistics to Biology x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA y (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....
From Linguistics to Biology x GCGGGAAUAGCUCAGUUGGUAGAGCACGACCUUGCCAAGGUCGGGGUCGCGAGUUCGAGUCUCGUUUCCCGCUCCA y (((((((..((((........)))).(((((.......))))).....(((((.......))))))))))))....
Computational Linguistics => Computational Biology linguistics compiler theory comp. linguistics computational biology 1955 Chomsky: context-free dynamic O ( n 3 ) 1958 Backus & Naur: grammars (CFGs) … programming CFGs for program, lang. S 1964 C ocke \ bottom-up NP VP 1965 K asami - CKY O ( n 3 ) DT NN VB NP 1967 Y ounger / for all CFGs the man bit DT NN the dog 1978: Nussinov O ( n 3 ) RNA folding 1981: Zuker & Siegler 5
Computational Linguistics => Computational Biology linguistics compiler theory comp. linguistics computational biology 1955 Chomsky: context-free dynamic O ( n 3 ) 1958 Backus & Naur: grammars (CFGs) … programming CFGs for program, lang. S 1964 C ocke \ bottom-up NP VP 1965 K asami - CKY O ( n 3 ) DT NN VB NP 1967 Y ounger / for all CFGs the man bit DT NN the dog 1965 Knuth: LR parsing for := restricted CFGs: O ( n ) id + x id const 1978: Nussinov O ( n 3 ) RNA folding y 3 x = y + 3; 1981: Zuker & Siegler O ( n ) 5
Computational Linguistics => Computational Biology linguistics compiler theory comp. linguistics computational biology 1955 Chomsky: context-free dynamic O ( n 3 ) 1958 Backus & Naur: grammars (CFGs) … programming CFGs for program, lang. S 1964 C ocke \ bottom-up NP VP 1965 K asami - CKY O ( n 3 ) DT NN VB NP 1967 Y ounger / for all CFGs the man bit DT NN the dog 1965 Knuth: LR parsing for := restricted CFGs: O ( n ) id + x id const 1978: Nussinov O ( n 3 ) RNA folding y 3 x = y + 3; 1986 Tomita: G eneralized LR 1981: Zuker & Siegler O ( n ) for all CFGs: O ( n 3 ) O ( n 3 ) 5
Computational Linguistics => Computational Biology linguistics compiler theory comp. linguistics computational biology 1955 Chomsky: context-free dynamic O ( n 3 ) 1958 Backus & Naur: grammars (CFGs) … programming CFGs for program, lang. S 1964 C ocke \ bottom-up NP VP 1965 K asami - CKY O ( n 3 ) DT NN VB NP 1967 Y ounger / for all CFGs the man bit DT NN the dog 1965 Knuth: LR parsing for := restricted CFGs: O ( n ) id + x id const 1978: Nussinov O ( n 3 ) RNA folding y 3 x = y + 3; 1986 Tomita: G eneralized LR 1981: Zuker & Siegler O ( n ) for all CFGs: O ( n 3 ) O ( n 3 ) 2010: Huang & Sagae: O ( n ) O ( n ) (approx.) DP for all CFGs 5
Recommend
More recommend