QC of scRNAseq data Åsa Björklund asa.bjorklund@scilifelab.se
Outline • Background on transcrip9onal burs9ng • Experimental setup – what could go wrong? • Spike-in RNAs • Quality control metrics • PCA for quality control
Transcrip8onal burs8ng Burst frequency and size is correlated with mRNA abundance • Many TFs have low mean expression (and low burst frequency) and will only be • detected in a frac9on of the cells (Suter et al. Science 2011)
Transcript drop-out vs burs8ng • When a transcript is present in the cell but is not converted to a cDNA and not detected – Drop-out • When a transcript is expressed in most cells of the celltype, but not in every cell – Transcrip9onal burs9ng. • Lowly expressed transcripts will have a lower chance of detec9on and most likely low burst frequency – hard to dis9nguish drop-out from burs9ng.
Experimental setup Cell dissocia9on Single cell capture Single cell lysis It is cri9cal to have healthy whole cells with no RNA leakage. Tissues can be dissolved with mechanical methods, detergents or enzyma9c diges9on. Short 9me from dissocia9on to cell capture to Reverse reduce effect on transcrip9onal state. transcrip9on PROBLEMS: • Incomplete dissocia9on can give mul9ple cells s9cking Preamplifica9on together. • To harsh lysis may damage the cells -> RNA degrada9on and RNA leakage Library prepara9on • Different lysis condi9ons may/may not give nuclear lysis. and sequencing (Kolodziejczyk et al. 2015)
Experimental setup Cell dissocia9on Single cell capture Single cell lysis Reverse transcrip9on Tissues that are hard to dissociate: Preamplifica9on Laser capture microscopy (LCM) Nuclei sor9ng PROBMLEMS: Library prepara9on • All these methods may give rise to empty wells/droplets, and and sequencing also duplicates or mul9ples of cells. • Long 9me for sor9ng may damage the cells (Kolodziejczyk et al. 2015)
Experimental setup Cell dissocia9on Single cell capture Single cell lysis Reverse transcrip9on Preamplifica9on Efficiency of reverse transcrip9on is the key to high sensi9vity. Drop-out rate is around 90-60% depending on the method used. Library prepara9on Two libraries with the same method using the same cell type may and sequencing have very different drop-out rates. (Kolodziejczyk et al. 2015)
Experimental setup Cell dissocia9on Single cell capture Single cell lysis Reverse Any amplifica9on step will introduce a bias in the data. transcrip9on Methods that uses UMIs will control for this to a large extent, but the chance of detec9ng a transcript that is amplified more is Preamplifica9on higher. Full length methods like SmartSeq2 has no UMIs, so we cannot Library prepara9on control for amplifica9on bias. and sequencing (Kolodziejczyk et al. 2015)
Experimental setup Cell dissocia9on Single cell capture Single cell lysis Mul9plexing of samples will not always be perfect, so the number of reads per cell may vary quite a lot. Reverse transcrip9on Base calls in the sequening may be effected by a number of factors: • Low complexity of library – may be an issue whey there are Preamplifica9on many primer dimers • Base call quality scores may be effected if there are contamina9ons in the flow cell Library prepara9on and sequencing (Kolodziejczyk et al. 2015)
Problems compared to bulk RNA-seq • Amplifica9on bias • Drop-out rates • Transcrip9onal burs9ng • Background noise • Bias due to cell-cycle, cell size and other factors (Karchenko et al. Nature Methods 2014)
Spike-in RNAs • Addi9on of external controls • ERCC spike-in most widely used, consists of 48 or 96 mRNAs at 17 different concentra9ons. • Important to add equal amounts to each cell, preferably in the lysis buffer. (Vallejos et al. PLOS Comp Biol 2015)
Spike-in RNAs Spike-ins can be used to model: • Technical noise & drop-out rates • Star9ng amount of RNA in the cell • Data normaliza9on (Vallejos et al. PLOS Comp Biol 2015)
Spike-in RNAs (Tung et al. Scien9fic Reports 2017)
Spike-in RNAs Finding biologically variable genes Coefficient of varia9on 2 : CV 2 = standard devia9on / mean ^2 (Brennecke et al. Nature Methods 2013)
QC-metrics – Mapping sta9s9cs ( % uniquely mapping ) – Frac9on of exon mapping reads – 3’ bias – for full length methods like SS2 – mRNA-mapping reads – Number of detected genes – Spike-in detec9on – Mitochondrial read frac9on – rRNA read frac9on – Pairwise correla9on to other cells
QC-metrics – Number of reads – Mapping sta9s9cs (% uniquely mapping) – Frac9on of exon mapping reads – mRNA-mapping reads (vs other types of genes like rRNA, sRNA, non coding, pseudogenes etc.) Low number of reads – may not have enough informa9on for that cell. Bad mapping may be an indica9on of a failed library prep. Low content of mRNAs will lead to more primer dimers and more spurious mapping and fewer mapping reads.
QC-metrics – 3’ bias (degraded RNA) – for full length methods like SS2 Not degraded Degraded 6e+05 3500000 3000000 5e+05 2500000 4e+05 read number read number 2000000 3e+05 1500000 2e+05 1000000 1e+05 500000 0 20 40 60 80 100 0 20 40 60 80 100 percentile of gene body (5' − >3') percentile of gene body (5' − >3') Look at propor9on of reads that maps to the 10-20% most 3’ end of the transcript
QC-metrics – Spike-in detec9on – Spike-in ra9o If the number of spike-in molecules that are detected is low, this is a clearly failed library. Propor9on of cell to spike-in reads is an indica9on of the star9ng amount of RNA from the cell. Low amount of cell RNA can indicate breakage or just a smaller cell.
QC-metrics – Number of detected genes Number of detected genes clearly correlates to the size of the cells, so be careful if you are working with cells with very varying sizes. High number of detected genes 60 may be an indica9on of 50 duplicate/mul9ple cells. 40 30 20 10 0 0.035 0.175 0.315 0.455 0.595 ailed QC
Recommend
More recommend