Modeling and Searching for Non-Coding RNA � W.L. Ruzzo ! http://www.cs.washington.edu/homes/ruzzo http://www.cs.washington.edu/homes/ruzzo/ courses/gs541/10sp
GENOME 541 Syllabus ! “… protein and DNA sequence analysis … to determine the "periodic table of biology," i.e., the list of proteins …, which can be regarded as the first stage in…” ! No mention of RNA… !
The Message ! noncoding RNA ! Cells make lots of RNA ! Functionally important, functionally diverse ! Structurally complex ! New tools required ! ! alignment, discovery, search, scoring, etc. ! 10
Rough Outline ! Today ! Noncoding RNA Examples ! RNA structure prediction ! Lecture 2 ! RNA “motif” models ! Search ! Lecture 3 ! Motif discovery ! Applications ! 17
RNA ! DNA: DeoxyriboNucleic Acid ! RNA: RiboNucleic Acid ! Like DNA, except: ! pairs ! Lacks OH on ribose (backbone sugar) ! with A ! Uracil (U) in place of thymine (T) ! A, G, C as before ! CH 3 ! thymine ! uracil ! 18
Fig. 2 . The arrows show the situation as it seemed in 1958. Solid arrows represent probable transfers, dotted arrows possible transfers. The absent arrows (compare Fig. 1) represent the impossible transfers postulated by the central dogma. They are the three possible arrows starting from protein. !
“Classical” RNAs ! rRNA - ribosomal RNA (~4 kinds, 120-5k nt) ! tRNA - transfer RNA (~61 kinds, ~ 75 nt) ! RNaseP - tRNA processing (~300 nt) ! snRNA - small nuclear RNA (splicing: U1, etc, 60-300nt) ! a handful of others !
RNA Secondary Structure: " RNA makes helices too ! U � C � A � A � C � G � Base pairs � G � C � A � G � C � A � U � A � U � C � G � C � G � A � U � U � G � G � C � 5´ � A � A � 3´ � A � A � C � U � Usually single stranded ! 26
Bacteria ! Triumph of proteins ! ~ 80% of genome is coding DNA ! Functionally diverse ! ! receptors ! ! motors ! ! catalysts ! ! regulators (Monod & Jakob, Nobel prize 1965) ! ! … ! 28
Proteins catalyze & regulate biochemistry ! 29
Not the only way! ! Alberts, et al, 3e. Riboswitch Protein alternative way SAM ! Grundy & Henkin, Mol. Microbiol 1998 Epshtein, et al., PNAS 2003 Winkler et al., Nat. Struct. Biol. 2003 34
Not the only way! ! Alberts, et al, 3e. Riboswitch Protein alternatives way SAM-II ! SAM-I ! Corbino et al., Genome Biol. 2005 Grundy, Epshtein, Winkler 35 et al., 1998, 2003
Not the only way! ! Alberts, et al, 3e. Riboswitch Protein alternatives way SAM-III ! SAM-I ! SAM-II ! Fuchs et al., NSMB 2006 Grundy, Epshtein, Winkler Corbino et al., 36 et al., 1998, 2003 Genome Biol. 2005
Not the only way! ! Alberts, et al, 3e. Riboswitch Protein alternatives way SAM-III ! SAM-I ! SAM-II ! SAM-IV ! Grundy, Epshtein, Winkler Corbino et al., Fuchs et al., Weinberg et al., 37 et al., 1998, 2003 Genome Biol. 2005 NSMB 2006 RNA 2008
Not the only way! ! Alberts, et al, 3e. Riboswitch Protein alternatives way SAM-III ! SAM-I ! SAM-II ! SAM-IV ! Grundy, Epshtein, Corbino et Fuchs Weinberg Meyer, etal., BMC Winkler al., et al., et al., Genomics 2009 et al., 1998, 2003 Genome NSMB RNA 2008 Biol. 2005 2006 38
39
40
Riboswitches ! ~ 20 ligands known; multiple nonhomologous solutions for some ! dozens to hundreds of instances of each ! TPP known in archaea & eukaryotes ! one known in bacteriophage ! on/off; transcription/translation; splicing; combinatorial control ! In some bacteria, more riboregulators identified than protein TFs ! all found since ~2003 !
58
ncRNA Example: T-boxes !
ncRNA Example: 6S ! medium size (175nt) ! structured ! highly expressed in E. coli in certain growth conditions ! sequenced in 1971; function unknown for 30 years !
6S mimics an " open promoter ! Bacillus/ " Clostridium ! Actino- bacteria ! E.coli Barrick et al. RNA 2005 Trotochaud et al. NSMB 2005 Willkomm et al. NAR 2005 64
Vol 462 | 3 December 2009 | doi:10.1038/nature08586 LETTERS Exceptional structured noncoding RNAs revealed by bacterial metagenome analysis Zasha Weinberg 1,2 , Jonathan Perreault 2 , Michelle M. Meyer 2 & Ronald R. Breaker 1,2,3 23S rRNA Ribozyme Not ribozyme Multistem junctions plus pseudoknots 16S rRNA Unknown function Group II intron GOLLD 10 tmRNA AdoCbl HEARO riboswitch RNase P OLE glmS Group I intron ribozyme IMES-1 IMES-2 Lysine riboswitch 1 100 1,000 Average size (nucleotides) 65
a RNAs of 73% R RG A A A C A R Y G U U G R R A R Y U G 0–14 nt unusual size 7% Y CG C U G G C GA A AG G Y Y R G C C G C A A R Y R Y C U G G A C G A A A U G C and A C R G C C 0–10 nt Y 1–6 nt Y G Y R UR AR R Y HEARO complexity ! U A A G UG 1–9 nt 1–2 nt Y C G 0–39 nt Y Y R U A UC GYY Y Y Pseudoknot U A C ACG CAU Y Y Y Y C U Y C R G R R R A A R C U U U C 0–18 nt Y G A 0–11 nt 3 ′ R 1–58 nt integration site G CG A R G 3 ′ R R R ORF A U G A 5 ′ A 5 ′ 0–7 nt 0–17 nt Y integration 0–1490 nt G C site U A C G A U R Y Y G R U C C C C A R 0–70 nt HEARO AA G G G GC U Stem usually U has A bulge or A-C mismatch b 149530 151150 A. variabilis ACAAAATATATTACTCAACTGTCAG ATGAGCCAAAAACGCGAACTAGAA 75790 Nostoc sp. ACAAAATATATCACTCAACTATGAGCCAAAAACGCGAACTAGAA 66 |
G R R U AAA a C AARC C R C A Y Y A A A G A G G G R RR U C A A U G A U G Y G Pseudoknot C A Y GOLLD R Y A U Y R G A Y C GRY R Y Y Y Y G CA Y C G R R R R U R R U R RA A RU R Y Y R R Y Y UA R R C C A Y A A A U R A C G UGY RA Y R 0–2 nt Y Y Pseudoknot U Y R Y U A A U U C U U A Y R G G G C U A C E-loop G C Y G R Y R G A A R C U G R G R G R R Y R AGRR E-loop A A A G G CA U 5 ′ G G C AR G Y U R Y A G AU G R R R G G Y U 3 nt CA G A R R U Y R A C A A U G G Y R Y Y Y C C R R U Y U A R Y C Y R R Y R G Y A Pseudoknot Y A R R R Y A R Y A G A 0–3 nt R R AR U A 0–2 nt U G Pseudoknot C G Y A R R R Y R C Y Y A A U Y R R G R Y R A G C U A G Y Y Y Y R G R R G R RA R G Y R Y Y R R U A R GG R R R G R A R UG C G R A R A A G A A A A G U Y Y Y R A A Y G G R Y A CAAU G U Y R RUUG A G 0–129 nt C G CU R Y G G R R U A (can contain tRNA) C U R A C U A Y A G G A G Y A U Y A U 1–2 nt G U G R R YR GR A A A 0–7 nt G U G G R C C A C 3 ′ U R Y Y Y 0–2 nt R: A or G, Y: C or U. 7 or 8 nt Y C U AC G AA G G U R Y nt: nucleotides R A C CUU Y A A Y AR UG A Pseudoknot G R Y R C U Base pair annotations Nucleotide C G Y R U A U A identity C G AA Covarying mutations U U U Compatible mutations N Y 97% A 0–22 Y R No mutations observed A nt N 90% G U U A UA N U b No treatment Mitomycin C 75% C G Zero-length connector R U Fraction of maximum Bacterial cell density Variable-length region G C 1 G G GOLLD RNA Variable-length loop UG GU Nucleotide present 0.5 Variable-length hairpin 97% 75% Modular sub-structure 90% 50% 0 GOLLD phage genomic DNA GOLLD phage genomic DNA Hours 0 2 4 6 8 10 12 14 22 2 4 6 8 10 12 14 22 67 |
! RNAs of unusual abundance ! More abundant than 5S rRNA ! From unknown marine organisms ! ! 68 ! �
Summary: RNA in Bacteria ! Widespread, deeply conserved, structurally sophisticated, functionally diverse, biologically important uses for ncRNA throughout prokaryotic world. ! Regulation of MANY genes involves RNA ! In some species, we know identities of more ribo- regulators than protein regulators ! Dozens of classes & thousands of new examples in just last 5 years !
Vertebrates ! Bigger, more complex genomes ! <2% coding ! But >5% conserved in sequence? ! And 50-90% transcribed? ! And structural conservation, if any, invisible (without proper alignments, etc.) ! What’s going on? !
Vertebrate ncRNAs ! mRNA, tRNA, rRNA, … of course ! PLUS: ! snRNA, spliceosome, snoRNA, teleomerase, microRNA, RNAi, SECIS, IRE, piwi-RNA, XIST (X-inactivation), ribozymes, … ! 77
MicroRNA ! 1st discovered 1992 in C. elegans ! 2nd discovered 2000, also C. elegans ! and human, fly, everything between ! 21-23 nucleotides ! literally fell off ends of gels ! Hundreds now known in human ! may regulate 1/3-1/2 of all genes ! development, stem cells, cancer, infectious diseases,… ! 79
2006 Nobel Prize ! siRNA ! Fire & Mello ! “Short Interfering RNA” ! Also discovered in C. elegans ! Possibly an antiviral defense, shares machinery with miRNA pathways ! Allows artificial repression of most genes in most higher organisms ! Huge tool for biology & biotech ! 80
Recommend
More recommend