earl bellinger and fabio mendes what are microarrays again
play

Earl Bellinger and Fabio Mendes What are microarrays again? A - PowerPoint PPT Presentation

Hangauer MJ, Vaughn IW, McManus MT (2013). Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs . PLoS Genetics 9(6): 1-13 Earl Bellinger and Fabio Mendes What are microarrays


  1. Hangauer MJ, Vaughn IW, McManus MT (2013). Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs . PLoS Genetics 9(6): 1-13 Earl Bellinger and Fabio Mendes

  2. What are microarrays again? A microarray is a 2D array on a solid substrate (usually a glass slide or silicon thin-film cell) that assays large amounts of biological material using high-throughput screening methods. "DNA microarrays are a well-established technology for measuring gene expression levels. Microarrays designed for this purpose use relatively few probes for each gene and are biased toward known and predicted gene structures" Mockler TC, Chan S, Sundaresan A, Chen H, Jacobsen SE, Ecker JR (2005). Applications of DNA tiling arrays for whole-genome analysis . Genomics 85(1): 1- 15.

  3. What are microarrays again? ● Tiling arrays: "Recently, high-density oligonucleotide-based whole-genome microarrays have emerged as a preferred platform for genomic analysis beyond simple gene expression profiling. Potential uses for such whole-genome arrays include empirical annotation of the transcriptome (...)" Mockler TC, Chan S, Sundaresan A, Chen H, Jacobsen SE, Ecker JR (2005). Applications of DNA tiling arrays for whole-genome analysis . Genomics 85(1): 1-15.

  4. Back to the paper... 1. Noncoding DNA (ncRNA, "Junk" DNA); 2. Intergenic DNA can be transcribed; 2.1. Over 80% of the human genome "serves some purpose, biochemically speaking" The ENCODE Project Consortium (2012). An integrated encyclopedia of DNA elements in the human genome . Nature 489(7414): 57-74. 2.2. "Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten." Graur D, Zheng Y, Price N, Azevedo RBR, Zufall RA, Elhaik E (2013). On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE . Genome Biology and Evolution. 3. What is the extent of intergenic transcription?

  5. ● What is the extent of noncoding transcription? Tiling arrays (problem: repetitive elements) Previous studies: Sequencing-based approaches (problem: just a fraction of the genome; small number of tissues) ● Long intergenic ncRNAs (lincRNAs): intergenic transcripts longer than 200 nucleotides in length that lack protein coding capacity; ● There is a limited set of annotated lincRNAs (GENCODE found only 5,000) compared to expectations from ENCODE; ● This study : ○ Puts together a "unique set of RNAseq data derived from both novel (6) and published (121) datasets that complements and significantly expands prior efforts" .

  6. ● This study : ○ Filtered: ■ transcripts overlapping protein coding genes and pseudogenes; ■ transcripts longer than 200 nucleotides with an ORF longer than 100 aminoacids (and transcripts overlapping those); ■ ncRNA genes that were known to be non-lincRNAs; ■ transcripts connected to a protein coding gene by a RNA-seq read (removed transcripts overlapping "extended" gene structures); ■ transcripts with a FPKM < 1 (equivalent to one copy per cell); ● FPKM: fragments per kilobase of transcript per million (# of fragments / length of transcript in kb / 10 6 ); essentially a measure of transcript abundance. ○ Merged transcripts sharing an exon (to avoid redundancy).

  7. RESULTS ● RNA-seq from the analyzed datasets mapped to 78.9% of the genome; ● When information from known genes, ESTs and cDNAs was incorporated, 85.2% of the genome showed evidence of transcription; ● >94% of the final set of merged lincRNAs consists of de novo assembled transcripts from RNA-seq data.

  8. RESULTS ● Read depth = coverage; base calls = (# of positions at specific depth)(read depth); NM genes = annotated protein coding genes; ● Protein coding gene exons have a larger fraction of base calls at high coverage (are transcribed more) - as expected; ● Intergenic regions "contain many highly expressed (transcribed) regions".

  9. RESULTS ● NR genes = ncRNA genes that were previously annotated; ● "(...) many regions of highly expression do exist within intergenic regions, far more than are accounted for by current ncRNA gene annotation".

  10. What is Epigenetics again? Epigenetics is the study of heritable changes in gene activity which are not caused by changes in the DNA sequence. (...) such changes are DNA methylation and histone modification . "But recently, it has been shown that chromatin modifications can be regarded as indicators of the transcriptional regulatory function and activity of certain types of genomic loci. With recent technological advances making it routine to survey chromatin modifications on a large scale, the epigenetics field is rapidly expanding from examining individual genes to all genes to the entirety of the human genome." Hon GC, Hawkins RD, Ren B (2009). Predictive chromatin signatures in the mammalian genome . Human Molecular Genetics 18(2): R195-201

  11. What is Epigenetics again? Hon GC, Hawkins RD, Ren B (2009). Predictive chromatin signatures in the mammalian genome . Human Molecular Genetics 18(2): R195-201 ● H3K4me3 and H3K36me3 are canonical epigenetic marks for activation; ● H3K27me3 is a canonical epigenetic mark for repression.

  12. What is Epigenetics again? ● ChIP sequencing (ChIP-seq): "ChIP is the most direct way to identify the binding sites of a single DNA-binding protein or t he location of modified histones "; Furey TS (2012). ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions . Nature Review 13: 840-52. "Methods developed to study DNA methylation include the use of (...) affinity enrichment using antibodies specific to 5-methylcytosine". Ku CS, Naidoo N, Wu M, et. al (2011). Studying the epigenome using next-generation sequencing . Journal of Medical Genetics 48: 721-30.

  13. What is Epigenetics again? ● ChIP-chip / ChIP-sequencing (ChIP-seq): Ku CS, Naidoo N, Wu M, et. al (2011). Studying the epigenome using next-generation sequencing . Journal of Medical Genetics 48: 721-30.

  14. Back to the paper… RESULTS ● Chromatin signatures were obtained from ChIP-seq studies; ● LincRNAs that were more expressed (FPKM > 5) were significantly enriched with marks for activation (H3K4me3 and H3K36me3) when compared to lincRNAs that were less expressed (FPKM < 1); ● Conversely, lincRNAs that were less expressed were significantly enriched with ● repressive marks (H3K27me3) when compared to lincRNAs that were more expressed.

  15. Back to the paper… RESULTS ● Using unsupervised hierarchical clustering, lincRNAs are shown to be differentially transcribed in a tissue- specific fashion; ● "The lincRNAs we describe are specifically regulated (...), attributes inconsistent with transcriptional noise."

  16. ● Compared lincRNA FPKM* values in polyA+ specific and polyA− specific RNA-seq libraries in H9 ESCs and HeLa cells; ● Analyzed transcripts with RNA-seq reads in all four datasets and with FPKM>1 in at least one of the two fractions for each cell type: ○ 16,819 NM genes ○ 127 lincRNAs ● Showed individual lincRNA and NM gene ratios of FPKMs in polyA+/polyA− fractions; ● Pearson correlation: ○ lincRNAs = 0.622 (P = 5.5E-15) ○ NM genes = 0.702 (P < 2.2E-16) ● Determined the maximally conserved 50 bp windows in each NM gene, lincRNA, and repetitive element (nonconserved control sequences); ● The maximally conserved 50 bp windows of 12 functional human lincRNAs are indicated for comparison. * F ragments P er K ilobase of exon per M illion fragments mapped

  17. LincRNAs Are Enriched for Trait- Associated SNPs ● Roughly 50% of all trait-associated SNPs (TASs) identified in genome-wide association studies are located in intergenic sequence; *P = 0.0173, **P<2.2E-16 ● Only a small portion are in protein coding 95% binomial proportion confidence interval gene exons; ● Supports the hypothesis of an abundance of functional elements in intergenic sequence.

  18. LincRNAs Are Enriched for Trait- Associated SNPs ● TASs have been identified within or proximal to noncoding RNAs including some lincRNAs; ● If lincRNAs are functional, they should be *P = 0.0173, **P<2.2E-16 enriched for TASs compared to nonexpressed intergenic regions; 95% binomial proportion confidence interval ● The paper finds that lincRNAs are more than 5-fold enriched for TASs compared to nonexpressed intergenic regions; ● Hence many trait-associated intergenic regions may function by encoding lincRNAs.

Recommend


More recommend