RNA-seq: Analysis options
✓ Genome? Biological samples/Library preparation ✓ Transcriptome Sequence reads FASTQ (+reference transcriptome index) Pseudocounts with Kallisto, Sailfish, Salmon Count matrix generated using tximport DGE with R: DGE or isoform-level DE with R: DESeq2, EdgeR, limma:voom Sleuth Differential Expression Analysis Workflow #1
✓ Genome? Biological samples/Library preparation ✓ Transcriptome Sequence reads (+reference genome index) (+known GTF, optional) FASTQ Alignment to Genome: Quality control: FASTQC HISAT2, STAR FASTQ (+reference transcriptome index) multiple BAMs Pseudocounts with Kallisto, Quality control: Qualimap Sailfish, Salmon Quality control: MultiQC Count matrix generated using tximport DGE with R: DGE or isoform-level DE with R: DESeq2, EdgeR, limma:voom Sleuth Differential Expression Analysis Workflow #1
✓ Genome Sequence reads ✓ GTF annotation file (transcriptome) (+reference genome index) FASTQ (+known GTF, optional) Alignment to Genome: HISAT2, STAR multiple BAMs (+known GTF) Count reads associated with genes: htseq-count, featureCounts Count matrix generated from BAM using featurecounts DGE with R: DESeq2, EdgeR, limma:voom Differential Expression Analysis Workflow #2
✓ Genome Sequence reads ✓ GTF annotation file FASTQ (transcriptome) Quality control: FASTQC (+reference genome index) FASTQ multiple BAMs (+known GTF, optional) Alignment to Genome: Quality control: Qualimap HISAT2, STAR multiple BAMs (+known GTF) Count reads associated with genes: Quality control: MultiQC htseq-count, featureCounts Count matrix generated from BAM using featurecounts DGE with R: DESeq2, EdgeR, limma:voom Differential Expression Analysis Workflow #2
✓ Genome Sequence reads ✓ GTF annotation file FASTQ (transcriptome) Quality control: FASTQC (+reference genome index) FASTQ multiple BAMs (+known GTF, optional) Alignment to Genome: Quality control: Qualimap HISAT2, STAR https:// hbctraining.github.io/ multiple BAMs Intro-to-rnaseq-hpc-O2/ (+known GTF) Count reads associated with genes: Quality control: MultiQC htseq-count, featureCounts Count matrix generated from BAM using featurecounts https:// DGE with R: hbctraining.github.io/ DESeq2, EdgeR, limma:voom DGE_workshop/ Differential Expression Analysis Workflow #2
Alternative methods: transcriptome assembly Reference-based assembly • Genome is known
Alternative methods: transcriptome assembly Reference-based assembly • Genome is known • Transcriptome not available or is not good enough
Alternative methods: transcriptome assembly Reference-based assembly • Genome is known • Transcriptome not available or is not good enough • Cufflinks and Scripture are two reference-based transcriptome assemblers
Alternative methods: transcriptome assembly Reference-based assembly • Genome is known • Transcriptome not available or is not good enough • Cufflinks and Scripture are two reference-based transcriptome assemblers • Additional annotation of any newly-discovered genes or isoforms will need to be generated
Alternative methods: transcriptome assembly De novo assembly • Genome is not known, or is of poor quality
Alternative methods: transcriptome assembly De novo assembly • Genome is not known, or is of poor quality • Amount of data needed is greater than for a reference-based assembly
Alternative methods: transcriptome assembly De novo assembly • Genome is not known, or is of poor quality • Amount of data needed is greater than for a reference-based assembly • Oases, TransABySS, Trinity are examples of well-regarded transcriptome assemblers, especially Trinity
Alternative methods: transcriptome assembly De novo assembly • Genome is not known, or is of poor quality • Amount of data needed is greater than for a reference-based assembly • Oases, TransABySS, Trinity are examples of well-regarded transcriptome assemblers, especially Trinity • Newly-discovered genes or isoforms will need to be annotated using homolog-based and other methodologies
Transcriptome Assembly De novo assembly Reference-based assembly Martin J.A. and Wang Z., Nat. Rev. Genet. (2011) 12:671–682
Sequence reads Quality control: FASTQC Alignment to Genome: Pseudocounts with Kallisto, HISAT2, STAR Sailfish, Salmon Count matrix generated using Reference-based tximport transcriptome assembly DGE with R: DESeq2, EdgeR, limma:voom Merge assemblies from all samples DGE or isoform-level DE with R: Sleuth Annotate the genes/transcripts Differential Expression Analysis Workflow #3
Sequence reads Quality control: FASTQC de novo Pseudocounts with Kallisto, assembly with Sailfish, Salmon Trinity Count matrix generated using tximport Annotate the genes/transcripts DGE with R: DESeq2, EdgeR, limma:voom DGE or isoform-level DE with R: Sleuth Differential Expression Analysis Workflow #4
These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Recommend
More recommend