the brave new world of non coding rnas
play

The Brave New World of Non-Coding RNAs Peter F. Stadler - PowerPoint PPT Presentation

The Brave New World of Non-Coding RNAs Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary Center for Bioinformatics, University of Leipzig Max-Planck-Institute for Mathematics in the Sciences RNomics


  1. The Brave New World of Non-Coding RNAs Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary Center for Bioinformatics, University of Leipzig Max-Planck-Institute for Mathematics in the Sciences RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology Institute for Theoretical Chemistry, Univ. of Vienna (external faculty) The Santa Fe Institute (external faculty) Jena, Aug 2010 Peter F. Stadler (Leipzig) Modern RNA World Jena, Aug 2010 1 / 35

  2. The Central Dogma DNA RNA protein − → − → ���� ���� transcription translation only 3% of the non-repetitive part of genome codes for proteins Is all the rest junk DNA ? Are all the repeats just genomic parasites? Peter F. Stadler (Leipzig) Modern RNA World Jena, Aug 2010 2 / 35

  3. Pervasive Transcription More than 90% of the non-repetitive genome shows evidence for transcription in at least one direction The ENCODE Consortium, Nature , 447: 779-816 (2007). Peter F. Stadler (Leipzig) Modern RNA World Jena, Aug 2010 3 / 35

  4. Transcriptome Complexity chr7: 26.90m 26.95m 27.00m 27.05m RNAz_set1_50 EvoFold sno/miRNA mRNA HOXA10 HOXA1 chr7.283 HOXA5 HOXA9 HOXA13 chr7.295 alternative chr7.279 HOXA4 HOXA6 chr7.290 hoxa11-as EVX1 splicing HOXA2 chr7.287 HOXA11 HOXA3 HOXA7 Affy Transcription Conservation RepeatMasker HOXA1 HOXA3 HOXA4 HOXA5 HOXA7 HOXA10 HOXA13 EVX1 HOXA1 HOXA3 HOXA5 HOXA7 HOXA10 AC004079.7 AC010990.1 AC004080.14 HOXA9 HOXA11 AC004079.7 AC010990.1 HOXA6 HOXA9 HOXA11 AC004079.7 AC010990.1 HOXA6 HOXA9 GENCODE AC004079.7 AC004080.14 HOXA10 HOXA2 HOXA6 HOXA9 HOXA3 AC004080.14 HOXA9 AC010990.1 AC004080.14 HOXA10 HOXA3 HOXA10 HOXA3 AC004080.12 AC004080.15 AC004080.1 AC004080.18 GENCODE AC004080.12 AC004080.15 AC004080.17 AC004080.19 putative AC004080.12 AC004080.1 AC004080.13 AC004080.1 Hox A cluster. Peter F. Stadler (Leipzig) Modern RNA World Jena, Aug 2010 4 / 35

  5. Transcriptome Complexity Science 316: 1484-1488 (2007) Peter F. Stadler (Leipzig) Modern RNA World Jena, Aug 2010 5 / 35

  6. H. phylori doesn’t read textbooks mapping of transcription start sites in Helicobacter pylori secondary start-sites and pervasive antisense transcription secondary internal cag island 119 440 antisense primary 969 810 orphan 1 1 g a c cag12 cag16 cag17 cagA cag10 cag19 cag20 cag22 cag23 cag24 cag25 38 3 3 4 5 8 1 3 1 1 1 1 2 5 g g g g g 0 a a a a a P c c c c c H 564,000 566,000 568,000 570,000 572,000 574,000 576,000 578,000 580,000 582,000 584,000 Nature 464 : 250-255 (2010) Peter F. Stadler (Leipzig) Modern RNA World Jena, Aug 2010 6 / 35

  7. A New Paradigm of Molecular Biology! There is no junk! Most of the human genome is transcribed, and there are good reasons to believe to most of the transcripts have function Most “genes” do not code for proteins We have to re-think — and maybe even abandon — the very notion of a gene Are these ncRNAs really functional???? Peter F. Stadler (Leipzig) Modern RNA World Jena, Aug 2010 7 / 35

  8. Evidence for ncRNA function A small number of well-studied transcripts have functions identifyable by genetic methods (e.g. deletion/complementation) Statistical arguments: differential regulation Conservation at sequence level Conservation of RNA structure Conservation of splicing patterns Association with (disease) phenotypes Specific processing Peter F. Stadler (Leipzig) Modern RNA World Jena, Aug 2010 8 / 35

  9. CHD QTL Locus The majority of QTLs for complex multi-genic diseases fall into non-coding regions Association of coronary heart disease (CHD) with a 58kb region on chr. 9p21 non-coding locus, produces the ANRIL transcript(s) ANRIL expression is associ- ated with the atherosclerosis risk Holdt et al. Arterioscler Thromb Vasc Biol. , 30 , 620-627 (2010) McPherson et al. , Science (2007) Peter F. Stadler (Leipzig) Modern RNA World Jena, Aug 2010 9 / 35

  10. Computational RNA Gene Finding Many (but by no means all known functional RNAs are structured, i.e. certain base pairing patterns must be conserved This implies that substitutions are not random, but must be consistent with (GC → GU) or even compensate for base pairs (GC → AU) Empirical Observation: Known ncRNAs are (a little bit) more stable than genomic background with the same base composition. IDEA: use this to build a gene finder Peter F. Stadler (Leipzig) Modern RNA World Jena, Aug 2010 10 / 35

  11. RNAz : a gene finder for structured RNA 1.2 1 0.8 Structure conservation index 0.6 0.4 Signal recognition 0.2 tRNA 5S rRNA particle RNA 0 1.2 1 0.8 0.6 0.4 U5 spliceosomal RNAseP U2 spliceosomal RNA 0.2 RNA 0 z-score Separation of native ncRNAs from random controls in two dimensions Proc. Natl. Acad. Sci. USA 102 : 2454-2459 (2005) Peter F. Stadler (Leipzig) Modern RNA World Jena, Aug 2010 11 / 35

  12. Structured RNAs in the Human Genome Chr. 13 92.0M 94.0M 96.0M 98.0M Most conserved noncoding regions (present in at least human/mouse/rat/dog) a RNAz structural RNAs (P>0.5) RNAz structural RNAs (P>0.9) RefSeq Genes Chr. 13 Chr. 11 90801000 90801500 93104k 93106k 93108k RNAz structural RNAs (P>0.9) c RNAz structural RNAs (P>0.5) b miRNAs RNAz structural RNAs (P>0.9) mir-17 mir-19a mir-19b-1 H/ACA snoRNAs mir-18 mir-20 ACA25 ACA1 ACA18 mir-92-1 ACA32 ACA8 ACA40 C/D-box snoRNAs mgh28S-2410 mgh28S-2412 (((((..((((((..((((((((.((.(((((...(((........)))...))))).)).))))))))...))))))....))))) C G A G U _ U Human GTCAGAATAATGTCAAAGTGCTTACAGTGCAGGTAGTGATATGT-GCATCTACTGCAGTGAAGGCACTTGTAGCATTA-TG-GTGAC G U C A G U _ A U U A A U d Mouse GTCAGAATAATGTCAAAGTGCTTACAGTGCAGGTAGTGATGTGT-GCATCTACTGCAGTGAGGGCACTTGTAGCATTA-TG-CTGAC A A U A A U G U C G G U U Rat GTCAGGATAATGTCAAAGTGCTTACAGTGCAGGTAGTGGTGTGT-GCATCTACTGCAGTGAAGGCACTTGTGGCATTG-TG-CTGAC C A A A G U G C U UA C A C A C G Chicken G A A GTCAGAGTAATGTCAAAGTGCTTACAGTGCAGGTAGTGATATATAGAACCTACTGCAGTGAAGGCACTTGTAGCATTA-TG-TTGAC G G U Zebrafish GTCAATGTATTGTCAAAGTGCTTACAGTGCAGGTAGTATTATGGAATATCTACTGCAGTGGAGGCACTTCTAGCAATA-CACTTGAC A C A U G U G C A G G U C C Fugu GTCTGTGTATTGCCAAAGTGCTTACAGTGCAGGTAGTTCTATGTGACACCTACTGCAATGGAGGCACTTACAGCAGTACTC-TTGAC U G _ A C U G U AG U G G A U A U Peter F. Stadler (Leipzig) Modern RNA World Jena, Aug 2010 12 / 35

  13. Structured RNAs in the Human Genome Mammalian genomes contain ∼ 10 5 structured RNA motifs Statistics of the highest-confidence fraction ( ∼ 36000): 3745 2866 15380 16860 2830 11205 Intron of coding region Known gene < 10 kb from nearest gene 3 ’−UTR (exon or intron) 5’−UTR (exon or intron) > 10 kb from nearest gene Nature Biotech. 23 1383-1390 (2005) Peter F. Stadler (Leipzig) Modern RNA World Jena, Aug 2010 13 / 35

  14. Finding mRNA-like ncRNAs long = contains at least one intron predict non -coding transcripts by predicting conserved short introns Why introns? • intron evolution is slow and essentially independent of the evolution of the mature sequence • splice sites are often conserved • disruption of correct splicing usually destroys function ! non-coding transcrips do not have randomly placed large in/dels. Why short introns? • Most Drosophila introns are short. • Can be accurately predicted (94% with both splice sites correct) Intron prediction (Lim & Burge 1999): machine learning using patterns of donor, acceptor, intron length, branch point, intron composition Peter F. Stadler (Leipzig) Modern RNA World Jena, Aug 2010 14 / 35

  15. mlncRNAs – splice sites Peter F. Stadler (Leipzig) Modern RNA World Jena, Aug 2010 15 / 35

Recommend


More recommend