NGS technologies DepthOfCoverage SNPs and Human Diseases XV Robert Kraaij Department of Internal Medicine r.kraaij@erasmusmc.nl
What will NGS bring us? RFLP TaqMan Array Array and Imputation Regional Sequencing Full Genome Sequencing
• First Generation: a bit of history • Next (Second) Generation • Third Generation
1977: Maxam & Gilbert Sequencing Walter Gilbert from wikipedia.org
Maxam & Gilbert Sequencing G G+A C+T C
1977: Sanger Sequencing Frederick Sanger from wikipedia.org
Sanger Sequencing G A T C
Sanger sequencing landmarks • 1977 bacteriophage φ X174 5.4 kb • 1984 Epstein-Barr virus 170 kb • 1995 Haemophilus influenzae 1.8 Mb • 2001 Human 3 Gb from wikipedia.org
The Human Genome Project Bill Clinton Tony Blair Craig Venter Francis Collins June 26 th , 2000 : working draft, 95% gesequenced April 14 th , 2003 : finished, 99% gesequenced Costs : $ 2.7 billion (instead of $ 3 billion) Timing : 1990 - 2003 (instead of 2005)
• First Generation: a bit of history • Next (Second) Generation • Third Generation
Next Generation: Illumina
Sequencing Workflow Library Data DNA preparation Sequencing analysis isolation
Sequencing Workflow Library Data DNA preparation Sequencing analysis isolation
Sequencing Workflow Library Data DNA preparation Sequencing analysis isolation
Illumina sequencing • fragment DNA • clonal amplification on flowcell by bridgePCR • sequencing-by-synthesis
Bridge amplification
Illumina sequencing • fragment DNA • clonal amplification on flowcell by bridgePCR • sequencing-by-synthesis
Sequencing by synthesis
Sequencing by synthesis
Per Cycle Imaging
Per Cycle Imaging G A T C
Per Cycle Base Calling G G good quality poor quality
Quality Scoring Phred Score Incorrect base Accuracy 1 in 10 90 % 10 20 1 in 100 99 % 1 in 1000 99.9 % 30 1 in 10000 99.99 % 40 50 1 in 100000 99.999 % 0 to 93 ASCII 33 to 126 = single character
FASTQ File @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTC +SEQ_ID !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>
Alignment or Mapping of Reads R E F E R E N C E G E N O M E (HG19) G A T T A C G G T A C T T G C A T A G C T T A C G G T A C T T G C A T A chromosome + position + strand sample.bam
Run QC and filtering sample.bam
sortedBAM file • both reads • quality scores • chromosome • position • quality flag • duplicate flag sample.bam • off target flag
Coverage T T A C G G T A C T T G C A T G G T A C T T G C A T A G C T G A T T A C G G T A C T T G C A C G G T A C T T G C A T A G T A C G G T A C T T G C A T A G A T T A C G G T A C T T G C A T A G C T 5x coverage
Variant Calling A T T A C G G T G C T T G C A C G G T G C T T G C A T A G C G A T T A C G G T G C T G C A T A G C T - T T A C G G T G C T T G C A T G G T G C T T G C A T A G C T G A T T A C G G T G C T T G C A C G G T G C T T G C A T A G T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T G = homozygous alternative
Variant Calling A T T A C G G T G C T T G C A C G G T G C T T G C A T A G C G A T T A C G G T A C T G C A T A G C T - T T A C G G T A C T T G C A T G G T G C T T G C A T A G C T G A T T A C G G T A C T T G C A C G G T G C T T G C A T A G T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A/G = heterozygous
Variant Calling G A T T A C G G T A C T T G C A C G G T G C T T G C A T A G T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A/G = heterozygous?
Variant Calling sequencing quality poor good G A T T A C G G T A C T T G C A C G G T G C T T G C A T A G T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T G
Illumina: Normal flow cell technology MiniSeq MiSeq NextSeq500 HiSeq2500 2 x 150 b 2 x 300 b 2 x 150 b 2 x 125 b 6.6 Gb 13 Gb 100 Gb 450/900 Gb 22M clusters 22M clusters 0.4B clusters 2B/4B clusters 1 day 3 days 1 day 6 days 100k € 250k € 50k$ 700k$ 4250 $/WG 3500 $/WG
Illumina: Patterned flow cell technology HiSeq4000 HiSeqX Five HiSeqX Ten NovaSeq6000 2 x 150 b 2 x 150 b 2 x 150 b 2 x 150 b 0.65/1.3 Tb 0.8/1.6 Tb 0.8/1.6 Tb 0.85/1.7 Tb 2/4 B clusters 2.5/5 B clusters 2.5/5 B clusters 2.8/5.6 B clusters 4 days 3 days 3 days 2 days 10 x 1M € 1M € 900k$ 5 x 1.2M$ 2500 $/WG 1500 $/WG 1000 $/WG 1200 $/WG
Illumina: Patterned flow cell technology Patterned flowcell Billions of nanowells Extreme high density No overlapping clusters Special polymerase? ExAmp clustering primer swaps
• First Generation: a bit of history • Next (Second) Generation • Third Generation
Next Generation: Roche 454
Roche 454 • fragment DNA • clonal amplification on bead by emPCR • load beads in PicoTiterPlate • sequencing-by- synthesis
Ion Torrent
Ion Torrent • fragment DNA • clonal amplification on bead by emPCR • load beads on chip • sequencing-by- synthesis
• First Generation: a bit of history • Next (Second) Generation • Third Generation
Third generation sequencing = single molecule sequencing
Third Generation: PacBio - last week update: bought by Illumina RS Sequal
SMRT technology Library prep Circular DNA SMRT cell
PacBio • no DNA amplification • real-time imaging of DNA polymerase • sequencing-by- synthesis
SMRT technology >10kb reads 1 Gb output Better chemistry De novo assembly Haplotyping Variant calling Posted February 10, 2014 The Genomics Resource Center University of Maryland http://www.igs.umaryland.edu
Oxford Nanopore
Oxford Nanopore
Oxford Nanopore
Oxford Nanopore 6 bases in pore 6x base calling Caller development Community ACCCGTCCG
Oxford Nanopore High error rate, but major improvement in 2017…
Recommend
More recommend