Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node (UGC) Version 6.3
Outline: www.robustpm.com • A bit of history • NGS technologies & sample prep • NGS applications • National Genomics Infrastructure – Sweden
What is sequencing? https://figures.boundless-cdn.com
Once upon a time… • Fredrik Sanger and Alan Coulson Chain Termination Sequencing (1977) Nobel prize 1980 Principle: SYNTHESIS of DNA is randomly TERMINATED at different points Separation of fragments that are 1 nucleotide different in size ! Lack of OH- group at 3’ position of deoxyribose 1 molecule sequenced at a time = 1 read Capillary sequencer: 384 reads per run
2006 REVOLUTION Thousands of molecules sequenced in parallel 1 mln reads sequenced per run Roche 454 GS FLX
Technologies
NGS technologies Company Platform Amplification Sequencing method Roche 454 (until 2016) emPCR Pyrosequencing Illumina HiSeq, MiSeq Bridge PCR Synthesis NextSeq, X10 LifeTechnologie Ion Torrent, emPCR Synthesis (pH) s(Thermo Ion Proton, S5 Fisher) Pacific RSII None Synthesis Biosciences SEQUEL (SMRT) Complete Nanoballs None Ligation genomics Oxford MinION None Flow Nanopore* GridION RIP technologies: Helicos, Polonator, SOLiD, 454 etc. In development: Tunneling currents, nanopores, etc.
Differences between platforms • Technology: chemistry + signal detection • Run times vary from hours to days • Production range from Mb to Gb • Read length from <100 bp to > 20 Kbp • Accuracy per base from 0.1% to 15% • Cost per base
Illumina Instrument Yield and run time Read Error Error Length rate type HiSeq2500 120 Gb – 600 Gb 100x100 0.1% Subst 27h or standard run (250x250) 540 Mb – 15 Gb MiSeq Up to 0.1% Subst (4 – 48 hours) 350x350 “ “ HiSeqXten 800 Gb - 1.8 Tb 150x150 (3 days) Main applications • Whole genome, exome and targeted reseq • Transcriptome analyses • Methylome and ChiPSeq • Rapid targeted resequencing (MiSeq) • Human genome seq (Xten)
Illumina : bridge amplification • 200M fragments per lane • Bridge amplification • Ends with blocking of free 3 ’ -ends and hybridisation of sequencing primer
Ion Torrent Chip Yield - run Read time Length 0.1 – 1 Gb Gb, 200 – 400 314, 316, 318 ( PGM ) 3 hrs bp P-I 10 Gb 200 bp ( Proton ) 4 hrs 1 Gb – 10 Gb 520, 530, 400 (600) bp 540 ( S5 ) 3 hrs (except 540) Main applications • Microbial and metagenomic sequencing • Targeted re-sequencing (gene panels) • Clinical sequencing
Ion Torrent - H + ion-sensitive field effect transistors
Ion PGM Ion S5XL 520 530 540 314 316 318 250 000 4 mln 9 mln 8 mln 15-20 mln 90 mln 400 bp 400 bp 400 bp 400 bp 400 bp 200 bp 100 Mb 500 Mb 1 Gb 1 Gb 5 Gb 10 Gb Ion Proton PI 90 mln 200 bp 10-18 Gb
PacBio SMRT-technology Instrument Yield and run Read Length Error Error time rate type 250 Mb – 1.3 Gb 250 bp – 30 kb RS II 15% Insertions /30 - 360 min , random (on a single (74 kb) passage!) SMRTCell 250 bp – 25 kb SEQUEL 2-6 Gb per as as RSII SMRT RSII 30-360 min Single-Molecule, Real-Time DNA sequencing
PacBio SMRT - technology Single Molecule Real Time
SMRT sequencing: common misconceptions High error rate? Irrelevant, because errors are random Depending on coverage Examples: • 8 Mb genome, 8 SNPs detected • 65 kb construct: 100% correct sequence • Detection of low frequency mutations High price? Bioinfo-time to assemble short reads Not for small genomes Bioinfo-time to assemble Better assembly quality long reads Single-molecule reads without PCR-bias
Oxford Nanopore MinION Reads up to 100k 1D and 2D reads 15-40% error rate Life time 5 days
Main types of equipment PacBio RSII Illumina HiSeq Ion Torrent PGM Illumina Xten Ion Proton Illumina MiSeq Ion S5 XL Ultra-long reads Short paired reads Short single-end reads FAST throughput HIGH throughput FAST throughput
Applications
NGS/MPS applications • Whole genome sequencing: – De novo sequencing – Re-sequencing • Transcriptome sequencing: – mRNA-se q – miRNA – Isoform discovery • Target re-sequencing – Exome – Large portions of a genome – Gene panels – Amplicons
De novo sequencing • Used to create a reference genome without previous reference
De novo vs re-sequencing ref De novo Re-seq No bias towards a reference Finding similarities to a reference No template to adapt to Easier to identify SNPs and minor events Fewer contigs Many contigs Novel events are lost Works best for large-scale events
De novo sequencing: Illumina strategy PacBio strategy Sequencing: Sequencing: • PE library with 350 bp • 10-20 kb library • PE library with 600 bp 50-80x • MP library with 2 kb (where 30x are reads above 10 kb) • MP library with 5-8-20 kb PE: 50-100x, MP 10-15x Analysis: Analysis: • ALLPATH • HGAP (haploid) • FALCON (diploid)
Example: de novo PacBio; Crow Assembly results, FALCON Sequencing results PRIMARY Number of SMRT cells: 70 ALTERNATIVE N50 8.5 Mb 23 kb N75 3.9 Mb 18 kb Total bases per SMRT: 1.39 Gb Nr contigs 4375 2614 Longest contig 36 Mb 121 kb Total reads per SMRT: 106 833 Total length 1.09 Gb 45 Mb
Transcriptome sequencing (RNA-seq) TOTAL RNA mRNA Splice isoforms • Dif.ex. • miRNA Non-codingRNA Annotation • Transcriptional regulation
mRNA: rRNA depletion vs polyA selection Method Pros Cons Recommended • • rRNA Captures on-going Does not get rid 20-40 mln reads depletion transcription of all rRNA (single or PE) • • Picks up non-coding Messy Dif.Ex. RNA profile polyA selection • • Gives a clean Dif.Ex. Does not pick 5-20 mln reads profile non-coding RNA Alternative for human RNA-seq: AmpliSeq Human Transcriptome panel : • faster, cheaper, works fine with FFPE • input: 50 ng total RNA • dif.ex. ONLY
RNA-seq experimental setup • mRNA only: any kit • mRNA and miRNA: only specialized kits • Always use DNase! • RIN value above 8. • CONTROL vs experimental conditions • Biological replicates: 4 strongly recommended
RNA-seq experimental setup PacBio Iso-seq : full-length transcriptome seq
Amplicon sequencing Used a lot in metagenomics • Community analysis – rRNA genes & spacers (16S, ITS) – Functional genes • Genotyping by sequencing
Amplicon sequencing Example 1: tight peak, OK FOR ANY NGS TECHNOLOGY Size difference among fragments must not exceed 80 bp (or 20% in length) Reason – preferential amplification of short fragments Example 2: several sizes, Example 3: broad peak; fractionation is needed size selection is needed => we HAVE to make several libraries SIZE MATTERS…
Size-related bias in amplicon-seq Courtesy Mikael Brandström Durling, Forest Mycology and Pathology, SLU
When you sequence an amplicon … On MiSeq FW read RW read On Ion FW read
Main types of equipment & applications Illumina HiSeq Ion Torrent PGM NextSeq, X10, MiSeq, Ion Proton PacBio RSII MiniSeq, NovaSeq Ion S5 XL SEQUEL Short paired reads Short single-end reads Ultra-long reads HIGH throughput FAST throughput FAST throughput Human WGS mRNA and miRNA Long amplicons Re-sequencing 30x Exome Re-sequencing mRNA and miRNA ChIP-seq De novo sequencing De novo transcriptome Short amplicons Novel isoform discovery Exome Gene panels Fusion transcript analysis ChIP-seq Clinical samples Haplotype phasing Short amplicons Clinical samples Methylation
Other technologies for scaffolding of genomes 10x Chromium -> Illumina sequencing BioNano Irys, optical mapping
What is “The BEST”?
SAMPLE QUALITY REQUIREMENTS 37
Sample prep: take home message PCR-quality sample and NGS-quality sample are two completely different things
Making an NGS library Sharing & size selection DNA QC – paramount importance Amplification Ligation of sequencing adaptors, technology specific
Library complexity Suboptimal sample Good sample (source: https://www.kapabiosystems.com)
DNA quality requirements Some DNA left in the well Sharp band of 20+kb No sign of proteins No smear of degraded DNA No sign of RNA NanoDrop: Qubit or Picogreen: 260/280 = 1.8 – 2.0 10 kb insert libraries: 3-5 ug 260/230 = 2.0 – 2.2 20 kb insert libraries: 10-20 ug
Example:
Recommend
More recommend