rna seq analysis options
play

RNA-seq: Analysis options Genome? Biological samples/Library - PowerPoint PPT Presentation

RNA-seq: Analysis options Genome? Biological samples/Library preparation Transcriptome Sequence reads FASTQ (+reference transcriptome index) Pseudocounts with Kallisto, Sailfish, Salmon Count matrix generated using tximport DGE with


  1. RNA-seq: Analysis options

  2. ✓ Genome? Biological samples/Library preparation ✓ Transcriptome Sequence reads FASTQ (+reference transcriptome index) Pseudocounts with Kallisto, Sailfish, Salmon Count matrix generated using tximport DGE with R: DGE or isoform-level DE with R: DESeq2, EdgeR, limma:voom Sleuth Differential Expression Analysis Workflow #1

  3. ✓ Genome? Biological samples/Library preparation ✓ Transcriptome Sequence reads (+reference genome index) (+known GTF, optional) FASTQ Alignment to Genome: Quality control: FASTQC HISAT2, STAR FASTQ (+reference transcriptome index) multiple BAMs Pseudocounts with Kallisto, Quality control: Qualimap Sailfish, Salmon Quality control: MultiQC Count matrix generated using tximport DGE with R: DGE or isoform-level DE with R: DESeq2, EdgeR, limma:voom Sleuth Differential Expression Analysis Workflow #1

  4. ✓ Genome Sequence reads ✓ GTF annotation file (transcriptome) (+reference genome index) FASTQ (+known GTF, optional) Alignment to Genome: HISAT2, STAR multiple BAMs (+known GTF) Count reads associated with genes: htseq-count, featureCounts Count matrix generated from BAM using featurecounts DGE with R: DESeq2, EdgeR, limma:voom Differential Expression Analysis Workflow #2

  5. ✓ Genome Sequence reads ✓ GTF annotation file FASTQ (transcriptome) Quality control: FASTQC (+reference genome index) FASTQ multiple BAMs (+known GTF, optional) Alignment to Genome: Quality control: Qualimap HISAT2, STAR multiple BAMs (+known GTF) Count reads associated with genes: Quality control: MultiQC htseq-count, featureCounts Count matrix generated from BAM using featurecounts DGE with R: DESeq2, EdgeR, limma:voom Differential Expression Analysis Workflow #2

  6. ✓ Genome Sequence reads ✓ GTF annotation file FASTQ (transcriptome) Quality control: FASTQC (+reference genome index) FASTQ multiple BAMs (+known GTF, optional) Alignment to Genome: Quality control: Qualimap HISAT2, STAR https:// hbctraining.github.io/ multiple BAMs Intro-to-rnaseq-hpc-O2/ (+known GTF) Count reads associated with genes: Quality control: MultiQC htseq-count, featureCounts Count matrix generated from BAM using featurecounts https:// DGE with R: hbctraining.github.io/ DESeq2, EdgeR, limma:voom DGE_workshop/ Differential Expression Analysis Workflow #2

  7. Alternative methods: transcriptome assembly Reference-based assembly • Genome is known

  8. Alternative methods: transcriptome assembly Reference-based assembly • Genome is known • Transcriptome not available or is not good enough

  9. Alternative methods: transcriptome assembly Reference-based assembly • Genome is known • Transcriptome not available or is not good enough • Cufflinks and Scripture are two reference-based transcriptome assemblers

  10. Alternative methods: transcriptome assembly Reference-based assembly • Genome is known • Transcriptome not available or is not good enough • Cufflinks and Scripture are two reference-based transcriptome assemblers • Additional annotation of any newly-discovered genes or isoforms will need to be generated

  11. Alternative methods: transcriptome assembly De novo assembly • Genome is not known, or is of poor quality

  12. Alternative methods: transcriptome assembly De novo assembly • Genome is not known, or is of poor quality • Amount of data needed is greater than for a reference-based assembly

  13. Alternative methods: transcriptome assembly De novo assembly • Genome is not known, or is of poor quality • Amount of data needed is greater than for a reference-based assembly • Oases, TransABySS, Trinity are examples of well-regarded transcriptome assemblers, especially Trinity

  14. Alternative methods: transcriptome assembly De novo assembly • Genome is not known, or is of poor quality • Amount of data needed is greater than for a reference-based assembly • Oases, TransABySS, Trinity are examples of well-regarded transcriptome assemblers, especially Trinity • Newly-discovered genes or isoforms will need to be annotated using homolog-based and other methodologies

  15. Transcriptome Assembly De novo assembly Reference-based assembly Martin J.A. and Wang Z., Nat. Rev. Genet. (2011) 12:671–682

  16. Sequence reads Quality control: FASTQC Alignment to Genome: Pseudocounts with Kallisto, HISAT2, STAR Sailfish, Salmon Count matrix generated using Reference-based tximport transcriptome assembly DGE with R: DESeq2, EdgeR, limma:voom Merge assemblies from all samples DGE or isoform-level DE with R: Sleuth Annotate the genes/transcripts Differential Expression Analysis Workflow #3

  17. Sequence reads Quality control: FASTQC de novo Pseudocounts with Kallisto, assembly with Sailfish, Salmon Trinity Count matrix generated using tximport Annotate the genes/transcripts DGE with R: DESeq2, EdgeR, limma:voom DGE or isoform-level DE with R: Sleuth Differential Expression Analysis Workflow #4

  18. These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Recommend


More recommend