Ultra high throughput DNA sequencing technologies Keith Harshman DNA Array Facility Center for Integrative Genomics University of Lausanne
Outline: 1. What UHTS is replacing: Sanger sequencing/CE 2. Current UHTS “next generation” technologies: a. Illumina Genome Analyzer II (aka “Solexa”) b. Applied Biosystem’s SOLiD c. 454 3. Some next next generation technologies 4. Some next next next generation technologies
Human Genome Re-sequencing using the Sanger Method 5.3x coverage $2,000,000-$4,000,000 27,000,000 AB 3730 reads ~15,000,000 plasmid preps
Enter UHTS (following a brief performance by MPSS)
Ultra high throughput/output: 3730: ~1 x 10 6 bases/day (12 x 96 sample run/day; 900bp reads) Genome Analyzer II: ~ 2 x 10 9 /run = ~670 x 10 6 bases/day (35bp reads)
= 25x 35bp reads ≠ 1x 900bp read
Illumina Genome Analyzer II
Sequencing Process 1 Library prep (~ 6 hrs) Fragment DNA Repair ends / Add A overhang Ligate adapters Select ligated DNA Automated Cluster Generation ( ~ 5 hrs) 2 Hybridize to flow cell Extend hybridized oligos 1-8 samples Perform bridge amplification 3 Sequencing (~ 48 to 72 hrs) Perform sequencing 1-8 samples Generate base calls
Genomic DNA Library Prep DNA fragments Blunting by Fill-in and exonuclease Phosphorylation Addition of A-overhang Ligation to adapters
Cluster generation: Cluster Station • Aspirates DNA samples into flow cell • Automates the formation of amplified clonal Flow cell clusters from the (clamped DNA single into place) molecules DNA libraries
Flow cell 8 channels Key to the simplified workflow Surface of flow • Clonal clusters are cell coated with a generated in a contained lawn of oligo pairs environment (need no clean rooms) • Sequencing also performed in the flow cell on the generated clusters
Cluster generation: Hybridize fragment & extend Adapter sequence > 50 M single molecules hybridize to the lawn of primers Bound molecules are then extended by polymerases 3’ extension
Cluster generation: Denature double-stranded DNA Newly Double-stranded synthesized molecule is strand Original denatured. template Original template is washed away. discard Newly synthesized covalently attached to the flow cell surface.
Cluster generation: Covalently bound spatially separated single molecules Single molecules bound to flow cell in a random pattern
Cluster generation: Bridge amplification Single-strand flips over to hybridize to adjacent primers to form a bridge. Hybridized primer is extended by polymerases.
Cluster generation: Bridge amplification � double-stranded bridge is formed.
Cluster generation: Bridge amplification Double-stranded bridge is denatured. Result: Two copies of covalently bound single- stranded templates.
Cluster generation: Bridge amplification Single-strands flip over to hybridize to adjacent primers to form bridges. Hybridized primer is extended by polymerase.
Cluster generation: Bridge amplification Bridge amplification cycle repeated till multiple bridges are formed
Cluster generation dsDNA bridges denatured. Reverse strands cleaved and washed away…..
Cluster generation … leaving a cluster with forward strands only.
Cluster generation Free 3’ ends are blocked to prevent unwanted DNA priming.
Sequencing Sequencing primer is Sequencing hybridized primer to adapter sequence.
Genome Analyzer II Sequencing Hybridize sequencing primer Terminator and Add 4 Fl- Incorporate fluorescent dye NTP’s + d Fl-NTP is are cleaved from Polymerase imaged the Fl-NTP X 36 - 50
Flow cell imaging Fluidics port Flow cell Prism laser Fluidics port
Genome Analyzer II imaging set up Tile . . . . camera . . Obj. lens FLOWCELL OIL 50 tiles/column PRISM X laser 2 columns/channel X 8 channels/flow cell
Genome Analyzer II Sequencing 50 MILLION CLUSTERS PER FLOW CELL 20 MICRONS 100 MICRONS
Base Calling T G C T A C G A T … 1 2 5 8 9 3 6 7 4 T T T T T T T G T … The identity of each base of a cluster is read off from sequential images
What comes out today: – 36bp standard read length; enabled for 50-75bp – >50 million reads per 8-channel (lane) flowcell; >6.25 million reads per channel – >1.5GB per standard run; >3GB per paired-end run – 2 day standard and 4 day paired-end run – Raw read accuracy of >99.5% (36bp) – Consensus accuracy of >99.999% (20x depth of coverage)
What comes out at the end of 2008 (Ha!) : – 36bp 75bp standard read length – 50million >130 million reads per flowcell; >6.25million>16 million reads per channel – >1.5GB 10GB per standard run; >3GB 20GB per paired-end run – 2 3.5 day standard and 4 7 day paired-end run – Raw read accuracy of >99.5% (36bp) – Consensus accuracy of >99.999% (20x depth of coverage) Plus improvements in data quality
What goes in: DNA Fragments + Adapters + Sequencing Library DNA fragment sources Applications • Genomic DNA -Genome and directed SNP/mutation; genome structure re-arrangements; re-sequencing breakpoints; CNVs; methylation pattern -Genome sequencing de novo genome sequencing • ChIP products transcription factor binding sites; protein complex positioning; methylation patterns • cDNA mRNA transcript structure and differential expression; small RNA discovery & differential expression • ??? ????
454 and SOLiD sequencing template preparation
Library preparation by Emulsion PCR Single DNA molecules + capture beads DNA to be sequenced Single-stranded PCR template + PCR mix Emulsion PCR Clonal sequencing template Sequencing Chambers Fan et al., Nature Reviews Genetics 2006 SOLiD: 90bp template fragment size; 1um beads, 10-20,000 template copies/bead 454: 300-500bp template fragment size; 30um beads, “millions” template copies/bead
454/Roche
Sequencing-by-Synthesis – pyrosequencing (454)
Sequencing Technologies ABI 3730xl: ~ 1 x10 6 bases per day (at 15 runs/day) 800 bases per read and 1250 reads per day Cost to sequence a human genome (2007): $4,000,000 454/Roche: ~ 100 x10 6 bases per day (at 1 run/day) 250 bases per read and 400,000 reads per run Cost to sequence a human genome (2007): $1,000,000 Illumina GA II/SOLiD ~ 1.5–3.0 x10 9 bases per run (1 run/3 days) 35 bases per read and 40-100 x10 6 reads per run Cost to sequence a human genome (2008): $100,000 (GA2) $60,000 (SOLiD)
The Next Next Generation Technologies • Complete Genomics (http://www.completegenomics.com): Sequencing of DNA Nano-balls (DNBs) using combinatorial Probe-Anchor Ligation (cPAL) • Pacific Biosciences (http://www.pacificbiosciences.com): Single Molecule Real Time DNA sequencing based on zero mode waveguides
Complete Genomics – Library Generation Library construction Template Amplification
Complete Genomics – Sequencing Sequencing surface Sequencing chemistry “Complete Genomics says that by next spring it will be conducting complete genome scans for $5,000.” -BioITWorld.com 6 January 2009
Pacific Biosciences – Sequencing vessel and method
Pacific Biosciences – Sequencing
Nanopore Sequencing: the Next Next Next Generation Sequencing Technology (?)
Recommend
More recommend