Comparison of RNA sequencing with 19,319 lab validated RT-qPCR assays Jan Hellemans, PhD London, UK October 20-21, 2014
Acknowledgements Biogazelle team & collaborators • • Biogazelle Ghent University • Steve Lefever • SEQC consortium • Christopher Mason • David Kreil • Leming Shi • Bio-Rad
Introduction • qPCR: reference technology for nucleic acid quanti fi cation • sensitivity and speci fi city • wide dynamic range • speed • relatively low cost • conceptual and practical simplicity • easy to perform ≠ easy to do it right • many steps involved • all need to be right
Assays & MIQE • design • amplicon length • primer positions (exonic or intron-spanning) • transcript coverage • in silico veri fi cation • speci fi city prediction (retropseudogenes and other homologues) • secondary structure analysis • empirical (wet lab) validation • speci fi city assessment (gel, melt, amplicon sequencing) • Cq of NTC (for SYBR assays) • ampli fi cation e ffi ciency determination (slope, E, SE(E), r ² )
Assays & MIQE • design • amplicon length • primer positions (exonic or intron-spanning) • transcript coverage • in silico veri fi cation • speci fi city prediction (retropseudogenes and other homologues) • secondary structure analysis • empirical (wet lab) validation • speci fi city assessment (gel, melt, amplicon sequencing) • Cq of NTC (for SYBR assays) • ampli fi cation e ffi ciency determination (slope, E, SE(E), r ² )
The perfect assay properties • speci fi c for the gene of interest (no o ff -target ampli fi cation) • detection of all transcript variants • detection not a ff ected by polymorphisms (no allelic bias or drop out) • ampli fi cation e ffi ciency ~100% • no gDNA co-ampli fi cation • no primer dimer formation
The perfect assay
The perfect assay ... or the best possible • For some genes, there is no perfect assay • no unique sequence (homology with other genes – pseudogenes) • no common sequence among all transcripts • regions are excluded because of repeats, secondary structures, SNPs, homology, ... • Make the best possible compromise and report potential issues • Design à in silico quality control à lab validation
Assay design using primerXL • database of genomic information (transcripts, SNPs, ...) • tools for target region selection (maximize transcript coverage) • primer3 design engine • analysis of secondary structures and SNPs in primer annealing regions • speci fi city prediction (BiSearch) • relaxation cascade (from perfect to best possible)
BiSearch speci fi city prediction • • BiSearch loose BiSearch strict • • 1222222222222222 1233333333333
BiSearch speci fi city prediction • • BiSearch loose BiSearch strict • • 1222222222222222 1233333333333 • only the gene of interest (FFAR2) reads ¡ seq ¡ gene_list ¡ o ffi cial_symbol ¡ location ¡ 2843 ¡ CATGGCAGTCACCATCTTCTGCTACTGGCGTTTTGTGTGGATCATGCTCTCCCAGCCC ENSG00000126262 ¡ FFAR2 ¡ 19:35940617-359 CTTGTGGGGGCCCAGAGGCGGCGCCGAGCCGTGGGGCTGGCTGTGGTGACGC 42667 ¡ TGCTCAATTTCCTGGTGTGCTTCGGACCTTACAGATCGGAA 1897 ¡ GTAAGGTCCGAAGCACACCAGGAAATTGAGCAGCGTCACCACAGCCAGCCCC ENSG00000126262 ¡ FFAR2 ¡ 19:35940617-359 ACGGCTCGGCGCCGCCTCTGGGCCCCCACAAGGGGCTGGGAGAGCATGATCC 42667 ¡ ACACAAAACGCCAGTAGCAGAAGATGGTGACTGCCATGAGATCGGAA 1535 ¡ GTAAGGTCCGAAGCACACCGAGAGCTGGGAGCAGGAGCTACACAGTCTGCTGG ENSG00000141456 ¡ AC091153.1 ¡ 17:4574680-4607 CCTCACTGCACACCCTGCTGGGGGCCCTGTACGAGGGAGCAGAGACTGCTCCT 632 ¡ GTGCAGAATGAAGGCCCTGGGGTGGAGATGCTGCTGTCCTCAGAA 1097 ¡ CATGGCAGTCACCATCTTCTGAGGACAGCAGCATCTCCACCCCAGGGCCTTCATT ENSG00000141456 ¡ AC091153.1 ¡ 17:4574680-4607 CTGCACAGGAGCAGTCTCTGCTCCCTCGTACAGGGCCCCCAGCAGGGTGTGCA 632 ¡ GTGAGGCCAGCAGACTGTGTAGCTCCTGCTCCCAGCTCTCGG 1091 ¡ CATGGCAGTCACCATCTTCTGAGGACAGCAGCATCTCCACCCCAGGGCCTTCATT ENSG00000141456 ¡ AC091153.1 ¡ 17:4574680-4607 CTGCACAGGAGCAGTCTCTGCTCCCTCGTACAGGGCCCCCAGCAGGGTGTGCA 632 ¡ GTGAGGCCAGCAGACTGTGTAGCTCCTGCTCCCAGCTCTCGGT
Wet lab validation setup • PCR composition • total volume: 5 µl • instrument: CFX384 (with automation) • mastermix: SsoAdvanced SYBR • primer conc: 250 nM each • PCR program • default cycling protocol for SsoAdvanced SYBR (Ta=60°C) • Samples • cDNA: 25 ng (total RNA equivalents – Agilent Universal human reference RNA = MAQC A) • gDNA: 2.5 ng (Roche) • NTC: water + carrier (5 ng/ μ l yeast transfer RNA) • synthetic template (pooled 60-mers in concentration range: 20 M – 20 copies)
Wet lab validation some numbers 305 m • lab validation of 103 053 assays (human, mouse and rat coding genes) • 1 456 142 reactions • 3 822 PCR plates (384-well) • equivalent to 15 288 PCR plates (96-well)
Ampli fi cation e ffi ciency synthetic templates • initial publication: Vermeulen et al., Nucleic Acids Research, 2009 • Biogazelle approach (easy & cost e ff ective) • 60-mer 30 nt 5’ 30 nt 3’ • no modi fi cations, standard desalted • 7 points dilution series: 20 000 000 > 20 molecules • equivalent to full length double stranded template ds template ss oligo r ² <0.99 1 1 median E 2.00 2.01 average E 2.00 2.01 count E <> [1.90-2.10] 1 3 paired t-test p-value 0.14 • limitation: behavior of fi rst cycles amplifying from cDNA are not evaluated
Ampli fi cation e ffi ciency distribution (n = 50 133) 89%
Ampli fi cation e ffi ciency distribution (n = 50 133) redesign 89% redesign
Speci fi city NGS for increased sensitivity • amplicon sizing ( + melt analysis for SYBR assays) • limited sensitivity for detecting low level non-speci fi c coampli fi cation • failure to observe non-speci fi c ampli fi cation of sequences with similar size and/or Tm e.g. expressed pseudogenes or homologous genes • Next level of speci fi city assessment • in silico speci fi city predictions by BiSearch • massively parallel sequencing of pooled PCR products • average coverage > 1000-fold à lab speci fi city > 99.9% • 50 – 200 times more sensitive than size analysis and Sanger sequencing
Speci fi city most assays are 100% on-target
Speci fi city 2/3 of non-speci fi c assays may go unnoticed without NGS 100% 0.9 < x < 1 0.8 < x < 0.9 0.7 < x < 0.8 75% % on-target 0.6 < x < 0.7 0.5 < x < 0.6 50% 0.4 < x < 0.5 0.3 < x < 0.4 0.2 < x < 0.3 25% 0.1 < x < 0.2 0 < x < 0.1 0% 0% 20% 40% 60%
Speci fi city the power of in silico veri fi cation perfect 60 293 86% acceptable 5 866 8% (<10% non-speci fi c) predicted non-speci fi city 1 204 2% (no speci fi c design found) failing speci fi city QC criteria 2 467 4%
MIQE compliant PrimePCR assay validation data sheet
Dynamic range gene count 1000 1500 2000 2500 3000 3500 4000 4500 5000 500 0 16 777.216 8 388.608 4 194.304 2 097.152 1 048.576 524.288 262.144 131.072 human 65.536 > 10 000 000 fold 32.768 copies per cell 16.384 mouse 8.192 4.096 2.048 rat 1.024 0.512 0.256 0.128 0.064 0.032 0.016 0.008 0.004 0.002 0.001
SEQC • multisite, cross-platform analysis of RNAseq • FDA sponsored and guided MAQC-III • Nature Biotechnology, Sept 2014 Focus on RNA sequencing quality control (SEQC) 2 Biogazelle co-authors • MAQC samples reference RNA with built in controls – known truths • > 100 billion reads • compared against qPCR (PrimePCR)
RNAseq vs PrimePCR Di ff erential expression 454 ILMN PGM PRO 0.83 0.89 0.86 0.89 13,190 genes 16,264 genes 14,981 genes 16,242 genes
qPCR (PrimePCR) vs RNAseq (Illumina) r ² = 75% for genes detected by both platforms
qPCR (PrimePCR) vs RNAseq (Illumina)
Saturation analysis ABRF-NGS dataset GENCODE12 PrimePCR preparation ¡ sample ¡ libraries ¡ reads ¡ mapping ¡ mapping ¡ MAQC A ¡ 22 ¡ 5 304 M ¡ 1 955 M (37%) ¡ 1 692 M (32%) ¡ ribo- depleted ¡ MAQC B ¡ 17 ¡ 3 370 M ¡ 1 447 M (43%) ¡ 1 193 M (35%) ¡ MAQC A ¡ 4 ¡ 427 M ¡ 291 M (68%) ¡ 278 M (65%) ¡ poly-A– enriched ¡ MAQC B ¡ 4 ¡ 446 M ¡ 323 M (72%) ¡ 297 M (67%) ¡
Saturation analysis ribo-depletion RNAseq - % of GENCODE12 100% 90% 80% 70% 60% 50% MAQC A - detection 40% MAQC B - detection 30% 20% 10% 0% 4 096 2 048 1 024 512 256 128 64 32 16 8 000 4 000 2 000 1 000 500 250 125 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000
Saturation analysis ribo-depletion RNAseq - % of GENCODE12 100% 90% 80% 70% 60% 50% MAQC A - detection 40% MAQC B - detection 30% 20% 10% 0% 4 096 2 048 1 024 512 256 128 64 32 16 8 000 4 000 2 000 1 000 500 250 125 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000
Saturation analysis ribo-depletion RNAseq - % of GENCODE12 100% 90% 80% 70% 60% MAQC A - detection 50% MAQC B - detection MAQC A - quanti fi cation 40% MAQC B - quanti fi cation 30% 20% 10% 0% 4 096 2 048 1 024 512 256 128 64 32 16 8 000 4 000 2 000 1 000 500 250 125 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000
Recommend
More recommend