Introduction to RNA-Seq David Wood Winter School in Mathematics and Computational Biology July 1, 2013
RNA is... Diverse Dynamic Central DNA Epigenetics rRNA RNA tRNA e c n a d n u b A Protein mRNA Time
RNA is... Diverse Dynamic Central DNA Epigenetics rRNA RNA tRNA e c n a d n u b A Protein mRNA Time Qualitative Integrative Quantitative Understand the molecular basis of gene function. Classify and transform cellular states
RNA studies involve... Biological System Questions Project Available Technology Resources DB ~/bin
RNA studies involve... Biological System Questions Project Available Technology Resources DB ~/bin This talk: Focusing on reference based mammalian RNA-seq analysis
Transcriptional Complexity pA pA pA TSS TSS TSS ATG ATG pA TSS AAA ATG AAA ATG ATG AAA ATG AAA ATG AAA ATG ATG microRNAs genomic DNA spliced intron protein coding regions transcription start site pA polyadenylation signal TSS non-coding regions translation start site polyadenylation AAA ATG
Transcriptional Complexity pA pA pA TSS TSS TSS ATG ATG pA TSS PASR miRNA tiRNA AAA ATG AAA ATG ATG AAA ATG AAA ATG AAA ATG ATG microRNAs genomic DNA spliced intron protein coding regions transcription start site pA polyadenylation signal TSS non-coding regions translation start site polyadenylation AAA ATG
Transcriptional Complexity pA pA pA TSS TSS TSS ATG ATG pA TSS PASR miRNA Alu tiRNA AAA ATG AAA ATG ATG AAA ATG AAA ATG AAA ATG ATG AAA AAA microRNAs genomic DNA spliced intron protein coding regions transcription start site pA polyadenylation signal TSS non-coding regions translation start site polyadenylation AAA ATG
Transcriptional Complexity Mutations Allelic Expression pA pA pA TSS TSS TSS ATG ATG pA TSS PASR miRNA Alu tiRNA AAA ATG AAA ATG ATG AAA ATG AAA ATG AAA ATG ATG AAA AAA RNA Editing microRNAs genomic DNA spliced intron protein coding regions transcription start site pA polyadenylation signal TSS non-coding regions translation start site polyadenylation AAA ATG
RNA-seq pA pA pA TSS TSS TSS ATG ATG pA TSS PASR miRNA Alu tiRNA AAA ATG AAA ATG ATG AAA ATG AAA ATG AAA ATG ATG AAA AAA non-spliced reads mutations junction reads strand specific Cloonan et al . Nat Methods 2008 ; 5:613-619
Advantages of RNA-seq Discovery genes, exons, junctions, UTRs, fusions (Present and Future) %#!!!!" %!!!!!" ,-./01-2340" $#!!!!" $!!!!!" #!!!!" !" #&" #'" (!" ($" (%" ()" (*" (#" ((" (+" (&" ('" +!" +$" 5/06789":6-02;/" <-;462/"=;>2/?" @6?-.>.A;/" /1BCD" <06E>;?6/6"
Advantages of RNA-seq Dynamic Range Discovery genes, exons, junctions, UTRs, fusions (Present and Future) %#!!!!" %!!!!!" ,-./01-2340" $#!!!!" $!!!!!" Mortazavi et al. Nat. Methods 2008; 5:621–628 #!!!!" !" #&" #'" (!" ($" (%" ()" (*" (#" ((" (+" (&" ('" +!" +$" 5/06789":6-02;/" <-;462/"=;>2/?" @6?-.>.A;/" /1BCD" <06E>;?6/6"
Advantages of RNA-seq Dynamic Range Discovery genes, exons, junctions, UTRs, fusions (Present and Future) %#!!!!" %!!!!!" ,-./01-2340" $#!!!!" $!!!!!" Mortazavi et al. Nat. Methods 2008; 5:621–628 #!!!!" !" #&" #'" (!" ($" (%" ()" (*" (#" ((" (+" (&" ('" +!" +$" 5/06789":6-02;/" Nucleotide <-;462/"=;>2/?" @6?-.>.A;/" Specific /1BCD" <06E>;?6/6"
Typical experiment workflow Field / Clinic Wet Lab Dry Lab Design Experiment Run Experiment Obtain RNA Sample Acquisition Field / Clinic / Lab Make Library Sequencing 1 ° Base Calling Mapping 2 ° Library QC 2 ° Analysis 3 ° Verification Sample Acquisition 3 ° Interpretation Validation Publish
Typical experiment workflow Field / Clinic Wet Lab Dry Lab Design Experiment Run Experiment Obtain RNA Sample Acquisition Field / Clinic / Lab Make Library Sequencing 1 ° Base Calling Mapping 2 ° Library QC 2 ° Analysis 3 ° Verification Sample Acquisition 3 ° Interpretation Validation Publish
Typical experiment workflow Field / Clinic Wet Lab Dry Lab Design Experiment Run Experiment Obtain RNA Sample Acquisition Field / Clinic / Lab Make Library Sequencing 1 ° Base Calling Mapping 2 ° Library QC 2 ° Analysis 3 ° Verification Sample Acquisition 3 ° Interpretation Validation Publish
Typical experiment workflow Field / Clinic Wet Lab Dry Lab Design Experiment Run Experiment Obtain RNA Sample Acquisition Field / Clinic / Lab Make Library Sequencing 1 ° Base Calling Mapping 2 ° Library QC 2 ° Analysis 3 ° Verification Sample Acquisition 3 ° Interpretation Validation Publish
Library Construction Deplete rRNA AAAAA 5% Target AAAAA RNA tRNA Enrich polyA AAAAA (15%) RNA Profile rRNA (80%) (ribosomes) AAA AAAAA Fragment A Capture cellular RNA (tiling arrays) ds-cDNA synthesis Sequencing Ligate adaptors + Amplify
Typical experiment workflow Field / Clinic Wet Lab Dry Lab Design Experiment Run Experiment Obtain RNA Sample Acquisition Field / Clinic / Lab Make Library Sequencing 1 ° Base Calling Mapping 2 ° Library QC 2 ° Analysis 3 ° Verification Sample Acquisition 3 ° Interpretation Validation Publish
RNA-seq Mapping Challenge #1: Introns ATG AAA
RNA-seq Mapping Challenge #1: Introns ATG AAA Align to database Split Read of junctions or Alignments transcriptome Trapnell et al. Bioinformatics 2009; 25:1105-11 Wood et al. Bioinformatics 2011; 27:580–581
RNA-seq Mapping Challenge #1: Introns ATG AAA Align to database Split Read of junctions or Alignments transcriptome Trapnell et al. Bioinformatics 2009; 25:1105-11 Wood et al. Bioinformatics 2011; 27:580–581 Challenge #2: Correctness Sufficient Overlap Sufficient Evidence
RNA-seq Mapping Challenge #1: Introns ATG AAA Align to database Split Read of junctions or Alignments transcriptome Trapnell et al. Bioinformatics 2009; 25:1105-11 Wood et al. Bioinformatics 2011; 27:580–581 Challenge #2: Correctness Challenge #3: Multi-mappers Sequence Align to the Sufficient Overlap Similarity transcriptome Sufficient Evidence
RNA-seq Mapping Data QC Align to Align to Align to Split read (clipping) Filter Set ‘genome’ ‘junctions’ Alignment Flag and Choose Alignments, Disambiguate Exclude Exclude Tophat: Trapnell et al. Bioinformatics 2009; 25:1105-11
RNA-seq Mapping Data QC Align to Align to Align to Split read (clipping) Filter Set ‘genome’ ‘junctions’ Alignment Flag and Choose Alignments, Disambiguate Exclude Exclude Tophat: Trapnell et al. Bioinformatics 2009; 25:1105-11 BAM BAM BAM Alignment Filtering Analysis Library QC
RNA-seq Mapping rRNA, tRNA reference? gene model? Algorithm? ? diploid? ESTs? Data QC Align to Align to Align to Split read (clipping) Filter Set ‘genome’ ‘junctions’ Alignment Flag and Choose Alignments, Disambiguate Exclude Exclude Tophat: Trapnell et al. Bioinformatics 2009; 25:1105-11 BAM BAM BAM Alignment Filtering Analysis Library QC
Typical experiment workflow Field / Clinic Wet Lab Dry Lab Design Experiment Run Experiment Obtain RNA Sample Acquisition Field / Clinic / Lab Make Library Sequencing 1 ° Base Calling Mapping 2 ° Library QC 2 ° Analysis 3 ° Verification Sample Acquisition 3 ° Interpretation Validation Publish
Library Quality Control (QC) Deplete rRNA Target AAAAA 5% AAAAA RNA tRNA Enrich polyA AAAAA (15%) RNA Profile rRNA (80%) (ribosomes) AAA AAAAA Fragment A Capture cellular RNA (tiling arrays) ds-cDNA synthesis Sequencing Ligate adaptors + Amplify
Library Quality Control (QC) Deplete Affects RNA content rRNA Target AAAAA 5% (Expression AAAAA RNA quantification) tRNA Enrich polyA AAAAA (15%) RNA Profile rRNA (80%) (ribosomes) AAA AAAAA Fragment A Capture cellular RNA (tiling arrays) ds-cDNA synthesis Sequencing Ligate adaptors + Amplify
Library Quality Control (QC) Deplete Affects RNA content rRNA Target AAAAA 5% (Expression AAAAA RNA quantification) tRNA Enrich polyA AAAAA (15%) RNA Profile rRNA (80%) Affects Insert Size (ribosomes) AAA AAAAA Fragment (transcript A identification) Capture cellular RNA (tiling arrays) ds-cDNA synthesis Sequencing Ligate adaptors + Amplify
Library Quality Control (QC) Deplete Affects RNA content rRNA Target AAAAA 5% (Expression AAAAA RNA quantification) tRNA Enrich polyA AAAAA (15%) RNA Profile rRNA (80%) Affects Insert Size (ribosomes) AAA AAAAA Fragment (transcript A identification) Capture cellular RNA (tiling arrays) Affects ds-cDNA Strand Specificity synthesis Sequencing Ligate adaptors + Amplify
Recommend
More recommend