csi5126 algorithms in bioinformatics
play

CSI5126 . Algorithms in bioinformatics RNA Secondary Structure Search - PowerPoint PPT Presentation

. Preamble . . . . . . . . . . Inference problem . Search problem Preamble Inference problem Search problem CSI5126 . Algorithms in bioinformatics RNA Secondary Structure Search Problem Marcel Turcotte School of Electrical


  1. . Preamble . . . . . . . . Preamble Inference problem Search problem Inference problem . Search problem Other paradigms Reporting sub-optimal structures (MFOLD, SFOLD) Partition function and the McCaskill’s calculation of P ij ’s Folding kinetics , identifying ribo-switches MFE for secondary structure for interacting RNA molecules Partition function for secondary structure for interacting RNA molecules Non-coding RNAs ( ncRNA genes ) identification (EvoFold, RNAz…) Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  2. . Preamble . . . . . . . . Preamble Inference problem Search problem Inference problem . Search problem Other paradigms Reporting sub-optimal structures (MFOLD, SFOLD) Partition function and the McCaskill’s calculation of P ij ’s Folding kinetics , identifying ribo-switches MFE for secondary structure for interacting RNA molecules Partition function for secondary structure for interacting RNA molecules Non-coding RNAs ( ncRNA genes ) identification (EvoFold, RNAz…) Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  3. . Preamble . . . . . . . . Preamble Inference problem Search problem Inference problem . Search problem Other paradigms Reporting sub-optimal structures (MFOLD, SFOLD) Partition function and the McCaskill’s calculation of P ij ’s Folding kinetics , identifying ribo-switches MFE for secondary structure for interacting RNA molecules Partition function for secondary structure for interacting RNA molecules Non-coding RNAs ( ncRNA genes ) identification (EvoFold, RNAz…) Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  4. . Preamble . . . . . . . . Preamble Inference problem Search problem Inference problem . Search problem Other paradigms Reporting sub-optimal structures (MFOLD, SFOLD) Partition function and the McCaskill’s calculation of P ij ’s Folding kinetics , identifying ribo-switches MFE for secondary structure for interacting RNA molecules Partition function for secondary structure for interacting RNA molecules Non-coding RNAs ( ncRNA genes ) identification (EvoFold, RNAz…) Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  5. . Preamble . . . . . . . . Preamble Inference problem Search problem Inference problem . Search problem Other paradigms Reporting sub-optimal structures (MFOLD, SFOLD) Partition function and the McCaskill’s calculation of P ij ’s Folding kinetics , identifying ribo-switches MFE for secondary structure for interacting RNA molecules Partition function for secondary structure for interacting RNA molecules Non-coding RNAs ( ncRNA genes ) identification (EvoFold, RNAz…) Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  6. . Preamble . . . . . . . . Preamble Inference problem Search problem Inference problem . Search problem Other paradigms Reporting sub-optimal structures (MFOLD, SFOLD) Partition function and the McCaskill’s calculation of P ij ’s Folding kinetics , identifying ribo-switches MFE for secondary structure for interacting RNA molecules Partition function for secondary structure for interacting RNA molecules Non-coding RNAs ( ncRNA genes ) identification (EvoFold, RNAz…) Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  7. . Preamble . . . . . . . . Preamble Inference problem Search problem Inference problem . Search problem Now what ? A secondary structure was inferred! It can be analyzed in order to propose new experiments , to propose a mechanism of action , or to develop novel therapeutic approaches (a new drug for instance) It can be used for finding new members of its family ( homologues ) and this requires adapted database searching techniques It can serve as a starting point for predicting the three-dimensional structure Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  8. . Search problem . . . . . . . . . Preamble Inference problem Preamble . Inference problem Search problem Now what ? A secondary structure was inferred! It can be analyzed in order to propose new experiments , therapeutic approaches (a new drug for instance) It can be used for finding new members of its family ( homologues ) and this requires adapted database searching techniques It can serve as a starting point for predicting the three-dimensional structure Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics to propose a mechanism of action , or to develop novel

  9. . Preamble . . . . . . . . Preamble Inference problem Search problem Inference problem . Search problem Now what ? A secondary structure was inferred! It can be analyzed in order to propose new experiments , to propose a mechanism of action , or to develop novel therapeutic approaches (a new drug for instance) It can be used for finding new members of its family ( homologues ) and this requires adapted database searching techniques It can serve as a starting point for predicting the three-dimensional structure Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  10. . Preamble . . . . . . . . Preamble Inference problem Search problem Inference problem . Search problem Now what ? A secondary structure was inferred! It can be analyzed in order to propose new experiments , to propose a mechanism of action , or to develop novel therapeutic approaches (a new drug for instance) It can be used for finding new members of its family ( homologues ) and this requires adapted database searching techniques It can serve as a starting point for predicting the three-dimensional structure Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  11. . Preamble . . . . . . . . Preamble Inference problem Search problem Inference problem . Search problem Now what ? A secondary structure was inferred! It can be analyzed in order to propose new experiments , to propose a mechanism of action , or to develop novel therapeutic approaches (a new drug for instance) It can be used for finding new members of its family ( homologues ) and this requires adapted database searching techniques It can serve as a starting point for predicting the three-dimensional structure Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  12. . . . . . . . . . . . . Preamble . Inference problem Search problem Preamble Inference problem Search problem Database search problem Find all sequences matching a user specified secondary structure motif or all the sequences that can be folded into a user specified structure Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  13. . . . . . . . . . . . . Preamble . Inference problem Search problem Preamble Inference problem Search problem Non-probabilistic approaches The first practical approaches were non-probabilistic A description language allows the users to represent structural motifs, and search databases RNAMOT , RNABOB , PatScan , and RNAMOTIF Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  14. parms wc += gu; descr h5(minlen=6,maxlen=7) ss(len=2) h5(minlen=3,maxlen=4) ss(minlen=4,maxlen=11) h3 ss(len=1) h5(minlen=4,maxlen=5) ss(len=7) h3 ss(minlen=4,maxlen=21) h5(minlen=4,maxlen=5) ss(len=7) h3 h3 ss(len=4) . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  15. . Search problem . . . . . . . . . Preamble Inference problem Preamble . Inference problem Search problem RNAMOT Gautheret D., Major F. & Cedergren R. (1990) Pattern searching/alignment with RNA primary and secondary structures: an effective descriptor for tRNA. Comp. Appl. Biosc. 6 , 325-331. Laferriere A., Gautheret D. & Cedergren R. (1994) An RNA pattern matching program with enhanced performances and portability. Comp. Appl. Biosci. 10 , 209-210. rna.igmors.u-psud.fr/gautheret/download Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  16. . but with a difgerent underlying algorithm using . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem RNABOB RNABOB is an implementation of D. Gautheret’s RNAMOT, a . non-deterministic finite state machine with node rewriting rules . (Computer scientists would probably cringe in horror. It works, and it’s fast, but is it street legal looking for an RNA motif that fits a hard consensus pattern — a la PROSITE patterns, but with base-pairing — you might check out RNABOB. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics in a computer science department? Who knows.) If you’re http://eddylab.org/software.html

  17. . Search problem . . . . . . . . . Preamble Inference problem Preamble . Inference problem Search problem RNAMOTIF Macke et al. (2001) Nuc. Acids. Res. 29 (22):4724-4735. Sophisticated scripting language Matches can be ranked using a user-defined scoring function Minimum free energy can be used in the definition of the scoring function casegroup.rutgers.edu/casegr-sh-2.5.html Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  18. . Search problem . . . . . . . . . Preamble Inference problem Preamble . Inference problem Search problem Discussion What are the main limitations? These computer programs are practical and can be applied to large data-sets Hard consensus pattern means hit-or-miss The major difgiculties arises from the subjectivity in deriving the best descriptor for a family of sequences It can be quite difgicult to design a pattern with both high sensitivity and high specificity Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  19. . Search problem . . . . . . . . . Preamble Inference problem Preamble . Inference problem Search problem Discussion What are the main limitations? These computer programs are practical and can be applied to large data-sets Hard consensus pattern means hit-or-miss The major difgiculties arises from the subjectivity in deriving the best descriptor for a family of sequences It can be quite difgicult to design a pattern with both high sensitivity and high specificity Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  20. . Search problem . . . . . . . . . Preamble Inference problem Preamble . Inference problem Search problem Discussion What are the main limitations? These computer programs are practical and can be applied to large data-sets Hard consensus pattern means hit-or-miss The major difgiculties arises from the subjectivity in deriving the best descriptor for a family of sequences It can be quite difgicult to design a pattern with both high sensitivity and high specificity Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  21. . Search problem . . . . . . . . . Preamble Inference problem Preamble . Inference problem Search problem Discussion What are the main limitations? These computer programs are practical and can be applied to large data-sets Hard consensus pattern means hit-or-miss The major difgiculties arises from the subjectivity in deriving the best descriptor for a family of sequences It can be quite difgicult to design a pattern with both high sensitivity and high specificity Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  22. . Search problem . . . . . . . . . Preamble Inference problem Preamble . Inference problem Search problem Discussion What are the main limitations? These computer programs are practical and can be applied to large data-sets Hard consensus pattern means hit-or-miss The major difgiculties arises from the subjectivity in deriving the best descriptor for a family of sequences It can be quite difgicult to design a pattern with both high sensitivity and high specificity Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  23. P 5 N 8 8 P , where P is the size of the grammar and N is . Discussion . . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem Edit-distance How can one move away from “hard” patterns? . G. Myers. Approximately matching context-free languages. Information Processing Letters vol. 54 (2) pp. 85-92, 1995. length of the string. k -mismatches N. El-Mabrouk, M. Rafginot, J.E. Duchesne, M. Lajoie and N. Luc. Approximate Matching of Secondary Structures. Journal of Bioinformatics and Computational Biology , Vol. 3 , No. 2, pp. 317-342, 2005. krpn , k is error threshold, n is string size, p is secondary structure size, r is number of “union” symbols Probabilistic , a principled approach Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  24. P 5 N 8 8 P , where P is the size of the grammar and N is . Discussion . . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem Edit-distance How can one move away from “hard” patterns? . G. Myers. Approximately matching context-free languages. Information Processing Letters vol. 54 (2) pp. 85-92, 1995. length of the string. k -mismatches N. El-Mabrouk, M. Rafginot, J.E. Duchesne, M. Lajoie and N. Luc. Approximate Matching of Secondary Structures. Journal of Bioinformatics and Computational Biology , Vol. 3 , No. 2, pp. 317-342, 2005. krpn , k is error threshold, n is string size, p is secondary structure size, r is number of “union” symbols Probabilistic , a principled approach Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  25. . Discussion . . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem How can one move away from “hard” patterns? . Edit-distance G. Myers. Approximately matching context-free languages. Information Processing Letters vol. 54 (2) pp. 85-92, 1995. length of the string. k -mismatches N. El-Mabrouk, M. Rafginot, J.E. Duchesne, M. Lajoie and N. Luc. Approximate Matching of Secondary Structures. Journal of Bioinformatics and Computational Biology , Vol. 3 , No. 2, pp. 317-342, 2005. krpn , k is error threshold, n is string size, p is secondary structure size, r is number of “union” symbols Probabilistic , a principled approach Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics O ( P 5 N 8 8 P ) , where P is the size of the grammar and N is

  26. . Discussion . . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem How can one move away from “hard” patterns? . Edit-distance G. Myers. Approximately matching context-free languages. Information Processing Letters vol. 54 (2) pp. 85-92, 1995. length of the string. k -mismatches N. El-Mabrouk, M. Rafginot, J.E. Duchesne, M. Lajoie and N. Luc. Approximate Matching of Secondary Structures. Journal of Bioinformatics and Computational Biology , Vol. 3 , No. 2, pp. 317-342, 2005. krpn , k is error threshold, n is string size, p is secondary structure size, r is number of “union” symbols Probabilistic , a principled approach Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics O ( P 5 N 8 8 P ) , where P is the size of the grammar and N is

  27. . Search problem . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Discussion . How can one move away from “hard” patterns? Edit-distance G. Myers. Approximately matching context-free languages. Information Processing Letters vol. 54 (2) pp. 85-92, 1995. length of the string. k -mismatches N. El-Mabrouk, M. Rafginot, J.E. Duchesne, M. Lajoie and N. Luc. Approximate Matching of Secondary Structures. Journal of Bioinformatics and Computational Biology , Vol. 3 , No. 2, pp. 317-342, 2005. secondary structure size, r is number of “union” symbols Probabilistic , a principled approach Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics O ( P 5 N 8 8 P ) , where P is the size of the grammar and N is O ( krpn ) , k is error threshold, n is string size, p is

  28. . Discussion . . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem How can one move away from “hard” patterns? . Edit-distance G. Myers. Approximately matching context-free languages. Information Processing Letters vol. 54 (2) pp. 85-92, 1995. length of the string. k -mismatches N. El-Mabrouk, M. Rafginot, J.E. Duchesne, M. Lajoie and N. Luc. Approximate Matching of Secondary Structures. Journal of Bioinformatics and Computational Biology , Vol. 3 , No. 2, pp. 317-342, 2005. secondary structure size, r is number of “union” symbols Probabilistic , a principled approach Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics O ( P 5 N 8 8 P ) , where P is the size of the grammar and N is O ( krpn ) , k is error threshold, n is string size, p is

  29. . Search problem . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Discussion . How can one move away from “hard” patterns? Edit-distance G. Myers. Approximately matching context-free languages. Information Processing Letters vol. 54 (2) pp. 85-92, 1995. length of the string. k -mismatches N. El-Mabrouk, M. Rafginot, J.E. Duchesne, M. Lajoie and N. Luc. Approximate Matching of Secondary Structures. Journal of Bioinformatics and Computational Biology , Vol. 3 , No. 2, pp. 317-342, 2005. secondary structure size, r is number of “union” symbols Probabilistic , a principled approach Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics O ( P 5 N 8 8 P ) , where P is the size of the grammar and N is O ( krpn ) , k is error threshold, n is string size, p is

  30. . Inference problem . . . . . . . . Preamble Inference problem Search problem Preamble Search problem . Transformational grammars Pioneered by Noam Chomsky in the ’50s to model natural languages Formal grammars allow to determine what novel sentences are grammatical or not Transformational grammars are sometimes called generative grammars We look at non-probabilistic grammars first! Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  31. . Inference problem . . . . . . . . Preamble Inference problem Search problem Preamble Search problem . Transformational grammars Pioneered by Noam Chomsky in the ’50s to model natural languages Formal grammars allow to determine what novel sentences are grammatical or not Transformational grammars are sometimes called generative grammars We look at non-probabilistic grammars first! Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  32. . Inference problem . . . . . . . . Preamble Inference problem Search problem Preamble Search problem . Transformational grammars Pioneered by Noam Chomsky in the ’50s to model natural languages Formal grammars allow to determine what novel sentences are grammatical or not Transformational grammars are sometimes called generative grammars We look at non-probabilistic grammars first! Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  33. . Inference problem . . . . . . . . Preamble Inference problem Search problem Preamble Search problem . Transformational grammars Pioneered by Noam Chomsky in the ’50s to model natural languages Formal grammars allow to determine what novel sentences are grammatical or not Transformational grammars are sometimes called generative grammars We look at non-probabilistic grammars first! Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  34. . Preamble . . . . . . . . . . Inference problem . Search problem Preamble Inference problem Search problem Chomsky hierarchy of transformational grammars Increasing order of expressivity , but also increasing order of computational resources . Each class of languages has its associated machine that serves for parsing (accepting, deciding, recognizing) sentences of this language. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . unrestricted context−sensitive context−free regular

  35. . Preamble . . . . . . . . . . Inference problem . Search problem Preamble Inference problem Search problem Chomsky hierarchy of transformational grammars Increasing order of expressivity , but also increasing order of computational resources . Each class of languages has its associated machine that serves for parsing (accepting, deciding, recognizing) sentences of this language. Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . unrestricted context−sensitive context−free regular

  36. . Inference problem . . . . . . . . Preamble Inference problem Search problem Preamble Search problem . Transformational grammars: definitions Constituted of symbols and rewriting rules (also called production rules ) having the following form, 2 types of symbols: terminal symbols and non-terminal symbols The lefu-hand side of a rule contains at least one non-terminal symbol , which is rewritten into the right hand-side of the rule Terminal symbols represents instances of the language , here nucleotides, and will be represented by lower-case letters Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics α → β

  37. . Inference problem . . . . . . . . Preamble Inference problem Search problem Preamble Search problem . Transformational grammars: definitions Constituted of symbols and rewriting rules (also called production rules ) having the following form, 2 types of symbols: terminal symbols and non-terminal symbols The lefu-hand side of a rule contains at least one non-terminal symbol , which is rewritten into the right hand-side of the rule Terminal symbols represents instances of the language , here nucleotides, and will be represented by lower-case letters Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics α → β

  38. . Inference problem . . . . . . . . Preamble Inference problem Search problem Preamble Search problem . Transformational grammars: definitions Constituted of symbols and rewriting rules (also called production rules ) having the following form, 2 types of symbols: terminal symbols and non-terminal symbols The lefu-hand side of a rule contains at least one non-terminal symbol , which is rewritten into the right hand-side of the rule Terminal symbols represents instances of the language , here nucleotides, and will be represented by lower-case letters Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics α → β

  39. . Inference problem . . . . . . . . Preamble Inference problem Search problem Preamble Search problem . Transformational grammars: definitions Constituted of symbols and rewriting rules (also called production rules ) having the following form, 2 types of symbols: terminal symbols and non-terminal symbols The lefu-hand side of a rule contains at least one non-terminal symbol , which is rewritten into the right hand-side of the rule Terminal symbols represents instances of the language , here nucleotides, and will be represented by lower-case letters Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics α → β

  40. . Search problem . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Transformational grammars: definitions . A small example, a grammar denoted by G S gS 1 cS 2 S 1 cS 2 S 2 gS 1 A derivation is the successive application of the rules starting with S (the start nonterminal). A string is accepted by the grammar if there exist a derivation of the string from S . Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . → | → | ϵ → | ϵ S ⇒ cS 2 ⇒ cgS 1 ⇒ cgcS 2 ⇒ cgcgS 1 ⇒ cgcg The language generated by G , denoted L ( G ) , is all the ⋆ strings that can be derived from S , { w | S ⇒ w } .

  41. . Search problem . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Transformational grammars: definitions . A small example, a grammar denoted by G S gS 1 cS 2 S 1 cS 2 S 2 gS 1 A derivation is the successive application of the rules starting with S (the start nonterminal). A string is accepted by the grammar if there exist a derivation of the string from S . Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . → | → | ϵ → | ϵ S ⇒ cS 2 ⇒ cgS 1 ⇒ cgcS 2 ⇒ cgcgS 1 ⇒ cgcg The language generated by G , denoted L ( G ) , is all the ⋆ strings that can be derived from S , { w | S ⇒ w } .

  42. . Search problem . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Transformational grammars: definitions . A small example, a grammar denoted by G S gS 1 cS 2 S 1 cS 2 S 2 gS 1 A derivation is the successive application of the rules starting with S (the start nonterminal). A string is accepted by the grammar if there exist a derivation of the string from S . Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . → | → | ϵ → | ϵ S ⇒ cS 2 ⇒ cgS 1 ⇒ cgcS 2 ⇒ cgcgS 1 ⇒ cgcg The language generated by G , denoted L ( G ) , is all the ⋆ strings that can be derived from S , { w | S ⇒ w } .

  43. . Search problem . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Transformational grammars: definitions . A small example, a grammar denoted by G S gS 1 cS 2 S 1 cS 2 S 2 gS 1 A derivation is the successive application of the rules starting with S (the start nonterminal). A string is accepted by the grammar if there exist a derivation of the string from S . Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . → | → | ϵ → | ϵ S ⇒ cS 2 ⇒ cgS 1 ⇒ cgcS 2 ⇒ cgcgS 1 ⇒ cgcg The language generated by G , denoted L ( G ) , is all the ⋆ strings that can be derived from S , { w | S ⇒ w } .

  44. A derivation can be visualized as a parse tree Terminals are leaves and non-terminals are internal nodes What was the input string ? Can you enumerate some of the productions of the grammar? S0 S1 S2 u S3 S4 S5 S6 a a S7 S8 S9 S10 u g S11 S12 s13 S14 c a S15 S16 g S17 S18 a g . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  45. A derivation can be visualized as a parse tree Terminals are leaves and non-terminals are internal nodes What was the input string ? Can you enumerate some of the productions of the grammar? S0 S1 S2 u S3 S4 S5 S6 a a S7 S8 S9 S10 u g S11 S12 s13 S14 c a S15 S16 g S17 S18 a g . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  46. A derivation can be visualized as a parse tree Terminals are leaves and non-terminals are internal nodes What was the input string ? Can you enumerate some of the productions of the grammar? S0 S1 S2 u S3 S4 S5 S6 a a S7 S8 S9 S10 u g S11 S12 s13 S14 c a S15 S16 g S17 S18 a g . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  47. A derivation can be visualized as a parse tree Terminals are leaves and non-terminals are internal nodes What was the input string ? Can you enumerate some of the productions of the grammar? S0 S1 S2 u S3 S4 S5 S6 a a S7 S8 S9 S10 u g S11 S12 s13 S14 c a S15 S16 g S17 S18 a g . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

  48. . Search problem . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Transformational grammars . A small example S gS 1 cS 2 S 1 cS 2 S 2 gS 1 Give examples of sentences accepted (generated) by the grammar. Which class of grammar is this? Marcel Turcotte . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . . . → | → | ϵ → | ϵ

  49. . Search problem . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Transformational grammars . A small example S gS 1 cS 2 S 1 cS 2 S 2 gS 1 Give examples of sentences accepted (generated) by the grammar. Which class of grammar is this? Marcel Turcotte . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . . . → | → | ϵ → | ϵ

  50. . Search problem . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Transformational grammars . A small example S gS 1 cS 2 S 1 cS 2 S 2 gS 1 Give examples of sentences accepted (generated) by the grammar. Which class of grammar is this? Marcel Turcotte . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . . . → | → | ϵ → | ϵ

  51. . Decidability . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem Chomsky hierarchy of transformational grammars Grammar type Productions . Regular finite state automata W aW , W a Context-free push-down automata W Context-sensitive linear bounded automata W Unrestricted Turing machines Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  52. . Decidability . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem Chomsky hierarchy of transformational grammars Grammar type Productions . Regular finite state automata W aW , W a Context-free push-down automata W Context-sensitive linear bounded automata W Unrestricted Turing machines Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  53. . Chomsky hierarchy of transformational grammars . . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem Grammar type . Decidability Productions Regular finite state automata Context-free push-down automata W Context-sensitive linear bounded automata W Unrestricted Turing machines Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics W → aW , W → a

  54. . Chomsky hierarchy of transformational grammars . . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem Grammar type . Decidability Productions Regular finite state automata Context-free push-down automata W Context-sensitive linear bounded automata W Unrestricted Turing machines Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics W → aW , W → a

  55. . Search problem . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Chomsky hierarchy of transformational grammars . Grammar type Decidability Productions Regular finite state automata Context-free push-down automata Context-sensitive linear bounded automata W Unrestricted Turing machines Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics W → aW , W → a W → γ

  56. . Search problem . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Chomsky hierarchy of transformational grammars . Grammar type Decidability Productions Regular finite state automata Context-free push-down automata Context-sensitive linear bounded automata W Unrestricted Turing machines Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics W → aW , W → a W → γ

  57. . Search problem . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Chomsky hierarchy of transformational grammars . Grammar type Decidability Productions Regular finite state automata Context-free push-down automata Context-sensitive linear bounded automata Unrestricted Turing machines Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics W → aW , W → a W → γ α W β → αγβ

  58. . Search problem . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Chomsky hierarchy of transformational grammars . Grammar type Decidability Productions Regular finite state automata Context-free push-down automata Context-sensitive linear bounded automata Unrestricted Turing machines Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics W → aW , W → a W → γ α W β → αγβ

  59. . Search problem . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Chomsky hierarchy of transformational grammars . Grammar type Decidability Productions Regular finite state automata Context-free push-down automata Context-sensitive linear bounded automata Unrestricted Turing machines Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics W → aW , W → a W → γ α W β → αγβ α → β

  60. aS 2 cS 2 sS 3 tS 3 . Inference problem . . . . . . . Preamble Inference problem Search problem Preamble Prosite Search problem . S 0 nS 1 S 1 yS 2 S 2 S 1 a c y What type of grammar is that? www.expasy.ch/prosite Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics N-glycosylation site n-{p}-[st]-{p}

  61. aS 2 cS 2 sS 3 tS 3 . Inference problem . . . . . . . Preamble Inference problem Search problem Preamble Prosite Search problem . S 0 nS 1 S 1 yS 2 S 2 S 1 a c y What type of grammar is that? www.expasy.ch/prosite Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics N-glycosylation site n-{p}-[st]-{p}

  62. . . . . . . . . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem Prosite What type of grammar is that? www.expasy.ch/prosite Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . N-glycosylation site n-{p}-[st]-{p} S 0 → nS 1 S 1 → aS 2 | cS 2 | . . . | yS 2 S 2 → sS 3 | tS 3 S 1 → a | c | . . . | y

  63. . . . . . . . . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem Prosite What type of grammar is that? www.expasy.ch/prosite Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . N-glycosylation site n-{p}-[st]-{p} S 0 → nS 1 S 1 → aS 2 | cS 2 | . . . | yS 2 S 2 → sS 3 | tS 3 S 1 → a | c | . . . | y

  64. . . . . . . . . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem Prosite What type of grammar is that? www.expasy.ch/prosite Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . N-glycosylation site n-{p}-[st]-{p} S 0 → nS 1 S 1 → aS 2 | cS 2 | . . . | yS 2 S 2 → sS 3 | tS 3 S 1 → a | c | . . . | y

  65. . Search problem aAu S stem-loop structures. sequences folding into either of the following two Write a grammar whose language consists of all the RNA secondary structure Inference problem gAc Preamble Search problem . Preamble . . cAg uAa . gCc Marcel Turcotte What type of grammar is that? gaga agag C uCa cCg A aCu B uBa gBc cBg aBu . Inference problem . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . G A A G A G G A N-N’ N-N’ N-N’ N-N’ N-N’ N-N’

  66. . Search problem aAu S stem-loop structures. . Write a grammar whose language consists of all the RNA secondary structure Inference problem gAc Preamble Search problem Inference problem Preamble . . cAg uAa . gCc Marcel Turcotte What type of grammar is that? gaga agag C uCa cCg A aCu B uBa gBc cBg aBu . sequences folding into either of the following two . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . G A A G A G G A N-N’ N-N’ N-N’ N-N’ N-N’ N-N’ → | | | → | | | → | | | → |

  67. . Search problem aAu S stem-loop structures. . Write a grammar whose language consists of all the RNA secondary structure Inference problem gAc Preamble Search problem Inference problem Preamble . . cAg uAa . gCc Marcel Turcotte What type of grammar is that? gaga agag C uCa cCg A aCu B uBa gBc cBg aBu . sequences folding into either of the following two . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . G A A G A G G A N-N’ N-N’ N-N’ N-N’ N-N’ N-N’ → | | | → | | | → | | | → |

  68. . Preamble . . . . . . . . Preamble Inference problem Search problem Inference problem . Search problem Cocke-Younger-Kasami (CYK) algorithm CYK is a widely used algorithm for the parsing of context-free grammars (CFG) The CFG must be first transformed into its Chomsky normal form (CNF) All the productions must be of the form: A BC ( exactly two nonterminals ) or A a ( exactly one terminal ) Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  69. . Preamble . . . . . . . . Preamble Inference problem Search problem Inference problem . Search problem Cocke-Younger-Kasami (CYK) algorithm CYK is a widely used algorithm for the parsing of context-free grammars (CFG) The CFG must be first transformed into its Chomsky normal form (CNF) All the productions must be of the form: A BC ( exactly two nonterminals ) or A a ( exactly one terminal ) Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  70. . Preamble . . . . . . . . Preamble Inference problem Search problem Inference problem . Search problem Cocke-Younger-Kasami (CYK) algorithm CYK is a widely used algorithm for the parsing of context-free grammars (CFG) The CFG must be first transformed into its Chomsky normal form (CNF) All the productions must be of the form: A BC ( exactly two nonterminals ) or A a ( exactly one terminal ) Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics

  71. . Search problem . . . . . . . . . Preamble Inference problem Preamble . Inference problem Search problem Cocke-Younger-Kasami (CYK) algorithm CYK is a widely used algorithm for the parsing of context-free grammars (CFG) The CFG must be first transformed into its Chomsky normal form (CNF) All the productions must be of the form: A a ( exactly one terminal ) Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics A → BC ( exactly two nonterminals ) or

  72. . Preamble . . . . . . . . . . Inference problem . Search problem Preamble Inference problem Search problem Cocke-Younger-Kasami (CYK) algorithm CYK is a widely used algorithm for the parsing of context-free grammars (CFG) The CFG must be first transformed into its Chomsky normal form (CNF) All the productions must be of the form: Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics A → BC ( exactly two nonterminals ) or A → a ( exactly one terminal )

  73. S 1 S 2 . Search problem . . . . . . . . . Preamble Inference problem Preamble . Inference problem Search problem Cocke-Younger-Kasami ( CYK ) algorithm S S 1 g S 2 T S 4 S 4 c Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics S → g T c

  74. . Inference problem . . . . . . . . . . Preamble Search problem . Preamble Inference problem Search problem Cocke-Younger-Kasami ( CYK ) algorithm S 1 g S 2 T S 4 S 4 c Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics S → S 1 S 2 S → g T c

  75. . . . . . . . . . . . . Preamble . Inference problem Search problem Preamble Inference problem Search problem Cocke-Younger-Kasami ( CYK ) algorithm S 2 T S 4 S 4 c Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics S → S 1 S 2 S 1 → g S → g T c

  76. . . . . . . . . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem Cocke-Younger-Kasami ( CYK ) algorithm S 4 c Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics S → S 1 S 2 S 1 → g S → g T c S 2 → T S 4

  77. . . . . . . . . . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem Cocke-Younger-Kasami ( CYK ) algorithm Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics S → S 1 S 2 S 1 → g S → g T c S 2 → T S 4 S 4 → c

  78. . . . . . . . . . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem Cocke-Younger-Kasami ( CYK ) algorithm Marcel Turcotte . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics S → S 1 S 2 S 1 → g S → g T c S 2 → T S 4 S 4 → c

  79. . S 3 S 7 u S 8 S 7 S 8 S 6 a S 5 S 5 S 6 a S 9 S 4 S 3 S 4 S 2 u S 1 S 1 S 2 S . S 9 S 10 g Search problem S 15 Marcel Turcotte g S 18 a S 17 S 17 S 18 S 16 g S 15 S 16 S 10 S 14 a S 13 S 13 S 14 S 11 c S 12 S 11 S 12 Cocke-Younger-Kasami ( CYK ) algorithm Write a CFG in CNF for the following stem-loop structure. Inference problem . . . . . . . . . . . . . . . . . . . . . . . . Preamble . Search problem Inference problem Preamble . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics G A A G G-C A-U U-A

  80. . S 5 S 6 S 7 u S 8 S 7 S 8 S 6 a S 5 S 3 S 9 a S 4 S 3 S 4 S 2 u S 1 Write a CFG in CNF for the following stem-loop structure. . S 9 S 10 g Inference problem S 15 Marcel Turcotte g S 18 a S 17 S 17 S 18 S 16 g S 15 S 16 S 10 S 14 a S 13 S 13 S 14 S 11 c S 12 S 11 S 12 Search problem Cocke-Younger-Kasami ( CYK ) algorithm Preamble . . . . . . . . . . . . . . . . . . . . . . . . Search problem . Inference problem Preamble . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics G A A G G-C A-U U-A S → S 1 S 2

  81. . S 5 S 9 S 10 S 7 u S 8 S 7 S 8 S 6 a S 5 S 6 g S 3 a S 4 S 3 S 4 S 2 Write a CFG in CNF for the following stem-loop structure. . Search problem S 9 S 10 Preamble g Marcel Turcotte g S 18 a S 17 S 17 S 18 S 16 S 15 S 11 S 12 S 15 S 16 S 14 a S 13 S 13 S 14 S 11 c S 12 Inference problem Cocke-Younger-Kasami ( CYK ) algorithm Search problem . . . . . . . . . . . . . . . . . . . . . . . . Inference problem . Preamble . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . G A A G G-C A-U U-A S → S 1 S 2 S 1 → u

  82. . S 5 S 9 S 10 S 7 u S 8 S 7 S 8 S 6 a S 5 S 6 g S 3 a S 4 Write a CFG in CNF for the following stem-loop structure. Cocke-Younger-Kasami ( CYK ) algorithm . Inference problem S 9 S 10 Search problem g Marcel Turcotte g S 18 a S 17 S 17 S 18 S 16 S 15 S 11 S 12 S 15 S 16 S 14 a S 13 S 13 S 14 S 11 c S 12 Preamble Search problem Inference problem . . . . . . . . . . . . . . . . . . . . . . . . Preamble . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . G A A G G-C A-U U-A S → S 1 S 2 S 1 → u S 2 → S 3 S 4

  83. . a S 9 S 9 S 10 S 7 u S 8 S 7 S 8 S 6 S 5 S 10 S 5 S 6 S 3 Write a CFG in CNF for the following stem-loop structure. Cocke-Younger-Kasami ( CYK ) algorithm . Inference problem Preamble g S 11 S 12 Inference problem g Marcel Turcotte g S 18 a S 17 S 17 S 18 S 16 S 15 S 12 S 15 S 16 S 14 a S 13 S 13 S 14 S 11 c Search problem Search problem Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . G A A G G-C A-U U-A S → S 1 S 2 S 1 → u S 2 → S 3 S 4 S 4 → a

  84. . S 6 S 9 S 9 S 10 S 7 u S 8 S 7 S 8 a S 10 S 5 Write a CFG in CNF for the following stem-loop structure. Cocke-Younger-Kasami ( CYK ) algorithm Search problem . Preamble Search problem g S 11 S 12 Preamble g Marcel Turcotte g S 18 a S 17 S 17 S 18 S 16 S 15 S 12 S 15 S 16 S 14 a S 13 S 13 S 14 S 11 c Inference problem Inference problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . G A A G G-C A-U U-A S → S 1 S 2 S 1 → u S 2 → S 3 S 4 S 4 → a S 3 → S 5 S 6

  85. . c . . . Preamble Inference problem . Preamble Inference problem Search problem Cocke-Younger-Kasami ( CYK ) algorithm Write a CFG in CNF for the following stem-loop structure. S 10 S 11 S 12 S 12 S 11 . S 13 S 14 S 13 a S 14 S 15 S 16 S 15 g S 16 S 17 S 18 S 17 a S 18 g Marcel Turcotte . Search problem . . . . . . . . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . G A A G G-C A-U U-A S → S 1 S 2 S 5 → a S 1 → u S 6 → S 7 S 8 S 2 → S 3 S 4 S 8 → u S 4 → a S 7 → S 9 S 10 S 3 → S 5 S 6 S 9 → g

  86. . . . . . . . . . . . . . . Preamble Inference problem Search problem Preamble Inference problem Search problem Cocke-Younger-Kasami ( CYK ) algorithm Write a CFG in CNF for the following stem-loop structure. Marcel Turcotte . . . . . . . . . . . . . . CSI5126 . Algorithms in bioinformatics . . . . . . . . . . . . G A A G G-C A-U U-A S → S 1 S 2 S 5 → a S 10 → S 11 S 12 S 15 → g S 1 → u S 6 → S 7 S 8 S 12 → c S 16 → S 17 S 18 S 2 → S 3 S 4 S 8 → u S 11 → S 13 S 14 S 17 → a S 4 → a S 7 → S 9 S 10 S 13 → a S 18 → g S 3 → S 5 S 6 S 9 → g S 14 → S 15 S 16

  87. S0 S1 S2 S3 S4 u S5 S6 a a S7 S8 S9 S10 u g S11 S12 c . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . .. . . . . .

Recommend


More recommend