ncRNA: Interest extensive noncoding sequence conservation Modeling and Searching even more extensive transcription for Non-Coding RNA “invisible” structural conservation? many RNA binding proteins W.L. Ruzzo examples: microRNAs, riboswitches Bottom line: important regulatory roles Outline Why RNA? Examples of RNA biology Computational Challenges Modeling Fig. 2 . The arrows show the situation as it seemed in 1958. Solid arrows represent Search probable transfers, dotted arrows possible transfers. The absent arrows (compare Fig. 1) Inference represent the impossible transfers postulated by the central dogma. They are the three possible arrows starting from protein. 1
RNA Secondary Structure: The “Central Dogma” RNA makes helices too DNA RNA Protein U CA A G C Base pairs AC G Protein G C gene A U U A C G C G DNA A U (chromosome) RNA G CA A A AU (messenger) C cell “Classical” RNAs Non-coding RNA mRNA Messenger RNA - codes for proteins tRNA Non-coding RNA - all the rest rRNA Before, say, mid 1990’s, 1-2 dozen known (critically snRNA (small nuclear - spl icing) important, but narrow roles: e.g. tRNA) snoRNA (small nucleolar - guides for t/rRNA Since mid 90’s dramatic discoveries modifications) Regulation, transport, stability/degradation RNAseP (tRNA maturation; ribozyme in bacteria) E.g. “microRNA”: ≈ 100’s in humans SRP (signal recognition particle; co-translational targeting of proteins to membranes) By some estimates, ncRNA >> mRNA telomerases 2
Bacteria Triumph of proteins 80% of genome is coding DNA Functionally diverse receptors motors catalysts regulators (Monod & Jakob, Nobel prize 1965) … Gene Regulation: The MET Repressor Alberts, et al, 3e. The Riboswitch protein alternative way SAM SAM Grundy & Henkin, Mol. Microbiol 1998 Epshtein, et al., PNAS 2003 Winkler et al., Nat. Struct. Biol. 2003 DNA Protein Alberts, et al, 3e. 3
Alberts, et al, 3e. Alberts, et al, 3e. The The Riboswitch Riboswitch protein protein alternatives alternatives way way SAM-II SAM-III SAM-I SAM-I SAM-II Fuchs et al., NSMB 2006 Corbino et al., Genome Biol. 2005 Grundy, Epshtein, Winkler Grundy, Epshtein, Winkler Corbino et al., et al., 1998, 2003 et al., 1998, 2003 Genome Biol. 2005 Alberts, et al, 3e. The Riboswitch protein alternatives way Widespread, deeply conserved, structurally boxed = confirmed sophisticated, functionally diverse, biologically riboswitch SAM-III (+2 more) important uses for ncRNA throughout prokaryotic world. SAM-I SAM-II SAM-IV Grundy, Epshtein, Winkler Corbino et al., Fuchs et al., Weinberg et al., et al., 1998, 2003 Genome Biol. 2005 NSMB 2006 RNA 2008 Weinberg, et al. Nucl. Acids Res., July 2007 35: 4809-4819. 4
RNA on the Rise Human Predictions In humans Evofold more RNA- than DNA-binding proteins? S Pedersen, G Bejerano, A Siepel, K Rosenbloom, K Lindblad-Toh, ES Lander, J Kent, W Miller, D much more conserved DNA than coding Haussler, "Identification and classification of MUCH more transcribed DNA than coding conserved RNA secondary structures in the In bacteria human genome." PLoS Comput. Biol., 2, #4 (2006) e33. regulation of MANY genes involves RNA 48,479 candidates (~70% FDR?) dozens of classes & thousands of new examples in just last 5 years RNAz FOLDALIGN S Washietl, IL Hofacker, M Lukasser, A Hutenhofer, E Torarinsson, M Sawera, JH Havgaard, M PF Stadler, "Mapping of conserved RNA secondary Fredholm, J Gorodkin, "Thousands of structures predicts thousands of functional corresponding human and mouse genomic noncoding RNAs in the human genome." Nat. regions unalignable in primary sequence Biotechnol., 23, #11 (2005) 1383-90. contain common RNA structure." Genome 30,000 structured RNA elements Res. , 16, #7 (2006) 885-9. 1,000 conserved across all vertebrates. 1800 candidates from 36970 (of 100,000) ~1/3 in introns of known genes, ~1/6 in UTRs pairs ~1/2 located far from any known gene 5
Fastest Human CMfinder Gene? Torarinsson, Yao, Wiklund, Bramsen, Hansen, Kjems, Tommerup, Ruzzo and Gorodkin. Comparative genomics beyond sequence based alignments: RNA structures in the ENCODE regions. Genome Research , Feb 2008, 18(2):242-251 PMID: 18096747 6500 candidates in ENCODE alone (better FDR, but still high) Origin of Life? Origin of Life? RNA can carry information too (RNA double helix) Life needs RNA can form complex structures information carrier: DNA RNA enzymes exist (ribozymes) molecular machines, like enzymes: Protein making proteins needs DNA + RNA + proteins The “RNA world” hypothesis: 1st life was RNA-based making (duplicating) DNA needs proteins Horrible circularities! How could it have arisen in an abiotic environment? Some extant RNAs are relicts of that origin; some are “modern” inventionsrel 6
ncRNA Example: Xist ncRNA Example: 6S large (12kb?) medium size (175nt) largely unstructured RNA structured required for X-inactivation in mammals highly expressed in E. coli in certain growth conditions sequenced in 1971; function unknown for 30 years 6S mimics an ncRNA Example: IRE open promoter Iron Response Element: a short conserved stem- loop, bound by iron response proteins (IRPs). Found in UTRs of various mRNAs whose products are involved in iron metabolism. E.g., the mRNA of ferritin (an iron storage protein) contains one IRE in its 5' UTR. When iron concentration is low, IRPs bind the ferritin mRNA IRE, repressing translation. Binding of multiple IREs in the 3' and 5' UTRs of the transferrin receptor (involved in iron acquisition) leads to increased mRNA E.coli stability. These two activities form the basis of iron homeostasis in the vertebrate cell. Barrick et al. RNA 2005 Trotochaud et al. NSMB 2005 Willkomm et al. NAR 2005 7
Iron Response Element ncRNA Example: MicroRNAs IRE (partial seed alignment): short (~22 nt) unstructured RNAs excised from GUUCCUGCUUCAACAGUGUUUGGAUGGAAC ~75nt precursor hairpin UUUCUUC.UUCAACAGUGUUUGGAUGGAAC approx antisense to mRNA targets, often in 3’ UTR UUUCCUGUUUCAACAGUGCUUGGA.GGAAC UUUAUC..AGUGACAGAGUUCACU.AUAAA regulate gene activity, e.g. by destabilizing (plants) UCUCUUGCUUCAACAGUGUUUGGAUGGAAC AUUAUC..GGGAACAGUGUUUCCC.AUAAU or otherwise suppressing (animals) message UCUUGC..UUCAACAGUGUUUGGACGGAAG UGUAUC..GGAGACAGUGAUCUCC.AUAUG hundreds and growing, each w/ perhaps 10x AUUAUC..GGAAGCAGUGCCUUCC.AUAAU targets Cav.por. UCUCCUGCUUCAACAGUGCUUGGACGGAGC Mus.mus. UAUAUC..GGAGACAGUGAUCUCC.AUAUG Some conserved human to worm; Mus.mus. UUUCCUGCUUCAACAGUGCUUGAACGGAAC Mus.mus. GUACUUGCUUCAACAGUGUUUGAACGGAAC others evolving rapidly Rat.nor. UAUAUC..GGAGACAGUGACCUCC.AUAUG Rat.nor. UAUCUUGCUUCAACAGUGUUUGGACGGAAC SS_cons <<<<<...<<<<<......>>>>>.>>>>> ncRNA Example: T-boxes ncRNA Example: Riboswitches • UTR structure that directly senses/binds small molecules & regulates mRNA • widespread in prokaryotes • some in eukaryotes 8
Example: Glycine Regulation The Glycine Riboswitch • How is glycine level regulated? • Actual answer (in many bacteria): • Plausible answer: gce g gce protein g protein g g g g g g 5 ′ 3 ′ TF g gce mRNA DNA TF glycine cleavage enzyme gene DNA glycine cleavage enzyme gene transcription factors (proteins) bind to DNA to turn nearby genes on or off Mandal et al. Science 2004 (Mandal, Lee, Barrick, Weinberg, Emilsson, Ruzzo, Breaker, Science 2004) 5’ gcvT ORF 3’ And… Fig. 3. Cooperative binding of two glycine molecules by the VC I-II RNA. Plot depicts the • More examples means better alignment fraction of VC II (open) and VC I-II (solid) bound to ligand versus the concentration of glycine. • Understand phylogenetic distribution The constant, n, is the Hill coefficient for the lines as indicated that best fit the aggregate data from four different regions (fig. S3). Shaded boxes demark the dynamic range (DR) of glycine • Find riboswitch in front of new gene concentrations needed by the RNAs to progress from 10%- to 90%-bound states. 9
Riboswitches Why? • ~ 20 ligands known; multiple nonhomologous • RNA’s fold, solutions for some and function • dozens to hundreds of instances of each • TPP known in archaea & eukaryotes • Nature uses • on/off; transcription/translation; splicing; combinatorial control what works • In some bacteria, more riboregulators identified than protein TFs • all found since ~2003 Outline Homology search • ncRNA: what/why? Sequence-based – Smith-Waterman • What does computation bring? – FASTA • How to model and search for ncRNA? – BLAST • Faster search Sharp decline in sensitivity at ~60-70% identity • Better model inference So, use structure, too 10
More recommend