olga vinnere pettersson phd
play

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted - PowerPoint PPT Presentation

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node (UGC) Version 6.3 Outline: www.robustpm.com A bit of history NGS technologies & sample prep NGS applications National Genomics


  1. Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node (UGC) Version 6.3

  2. Outline: www.robustpm.com • A bit of history • NGS technologies & sample prep • NGS applications • National Genomics Infrastructure – Sweden

  3. What is sequencing? https://figures.boundless-cdn.com

  4. Once upon a time… • Fredrik Sanger and Alan Coulson Chain Termination Sequencing (1977) Nobel prize 1980 Principle: SYNTHESIS of DNA is randomly TERMINATED at different points Separation of fragments that are 1 nucleotide different in size ! Lack of OH- group at 3’ position of deoxyribose 1 molecule sequenced at a time = 1 read Capillary sequencer: 384 reads per run

  5. 2006 REVOLUTION Thousands of molecules sequenced in parallel 1 mln reads sequenced per run Roche 454 GS FLX

  6. Technologies

  7. NGS technologies Company Platform Amplification Sequencing method Roche 454 (until 2016) emPCR Pyrosequencing Illumina HiSeq, MiSeq Bridge PCR Synthesis NextSeq, X10 LifeTechnologie Ion Torrent, emPCR Synthesis (pH) s(Thermo Ion Proton, S5 Fisher) Pacific RSII None Synthesis Biosciences SEQUEL (SMRT) Complete Nanoballs None Ligation genomics Oxford MinION None Flow Nanopore* GridION RIP technologies: Helicos, Polonator, SOLiD, 454 etc. In development: Tunneling currents, nanopores, etc.

  8. Differences between platforms • Technology: chemistry + signal detection • Run times vary from hours to days • Production range from Mb to Gb • Read length from <100 bp to > 20 Kbp • Accuracy per base from 0.1% to 15% • Cost per base

  9. Illumina Instrument Yield and run time Read Error Error Length rate type HiSeq2500 120 Gb – 600 Gb 100x100 0.1% Subst 27h or standard run (250x250) 540 Mb – 15 Gb MiSeq Up to 0.1% Subst (4 – 48 hours) 350x350 “ “ HiSeqXten 800 Gb - 1.8 Tb 150x150 (3 days) Main applications • Whole genome, exome and targeted reseq • Transcriptome analyses • Methylome and ChiPSeq • Rapid targeted resequencing (MiSeq) • Human genome seq (Xten)

  10. Illumina : bridge amplification • 200M fragments per lane • Bridge amplification • Ends with blocking of free 3 ’ -ends and hybridisation of sequencing primer

  11. Ion Torrent Chip Yield - run Read time Length 0.1 – 1 Gb Gb, 200 – 400 314, 316, 318 ( PGM ) 3 hrs bp P-I 10 Gb 200 bp ( Proton ) 4 hrs 1 Gb – 10 Gb 520, 530, 400 (600) bp 540 ( S5 ) 3 hrs (except 540) Main applications • Microbial and metagenomic sequencing • Targeted re-sequencing (gene panels) • Clinical sequencing

  12. Ion Torrent - H + ion-sensitive field effect transistors

  13. Ion PGM Ion S5XL 520 530 540 314 316 318 250 000 4 mln 9 mln 8 mln 15-20 mln 90 mln 400 bp 400 bp 400 bp 400 bp 400 bp 200 bp 100 Mb 500 Mb 1 Gb 1 Gb 5 Gb 10 Gb Ion Proton PI 90 mln 200 bp 10-18 Gb

  14. PacBio SMRT-technology Instrument Yield and run Read Length Error Error time rate type 250 Mb – 1.3 Gb 250 bp – 30 kb RS II 15% Insertions /30 - 360 min , random (on a single (74 kb) passage!) SMRTCell 250 bp – 25 kb SEQUEL 2-6 Gb per as as RSII SMRT RSII 30-360 min Single-Molecule, Real-Time DNA sequencing

  15. PacBio SMRT - technology Single Molecule Real Time

  16. SMRT sequencing: common misconceptions High error rate? Irrelevant, because errors are random Depending on coverage Examples: • 8 Mb genome, 8 SNPs detected • 65 kb construct: 100% correct sequence • Detection of low frequency mutations High price? Bioinfo-time to assemble short reads Not for small genomes Bioinfo-time to assemble Better assembly quality long reads Single-molecule reads without PCR-bias

  17. Oxford Nanopore MinION Reads up to 100k 1D and 2D reads 15-40% error rate Life time 5 days

  18. Main types of equipment PacBio RSII Illumina HiSeq Ion Torrent PGM Illumina Xten Ion Proton Illumina MiSeq Ion S5 XL Ultra-long reads Short paired reads Short single-end reads FAST throughput HIGH throughput FAST throughput

  19. Applications

  20. NGS/MPS applications • Whole genome sequencing: – De novo sequencing – Re-sequencing • Transcriptome sequencing: – mRNA-se q – miRNA – Isoform discovery • Target re-sequencing – Exome – Large portions of a genome – Gene panels – Amplicons

  21. De novo sequencing • Used to create a reference genome without previous reference

  22. De novo vs re-sequencing ref De novo Re-seq No bias towards a reference Finding similarities to a reference No template to adapt to Easier to identify SNPs and minor events Fewer contigs Many contigs Novel events are lost Works best for large-scale events

  23. De novo sequencing: Illumina strategy PacBio strategy Sequencing: Sequencing: • PE library with 350 bp • 10-20 kb library • PE library with 600 bp 50-80x • MP library with 2 kb (where 30x are reads above 10 kb) • MP library with 5-8-20 kb PE: 50-100x, MP 10-15x Analysis: Analysis: • ALLPATH • HGAP (haploid) • FALCON (diploid)

  24. Example: de novo PacBio; Crow Assembly results, FALCON Sequencing results PRIMARY Number of SMRT cells: 70 ALTERNATIVE N50 8.5 Mb 23 kb N75 3.9 Mb 18 kb Total bases per SMRT: 1.39 Gb Nr contigs 4375 2614 Longest contig 36 Mb 121 kb Total reads per SMRT: 106 833 Total length 1.09 Gb 45 Mb

  25. Transcriptome sequencing (RNA-seq) TOTAL RNA mRNA Splice isoforms • Dif.ex. • miRNA Non-codingRNA Annotation • Transcriptional regulation

  26. mRNA: rRNA depletion vs polyA selection Method Pros Cons Recommended • • rRNA Captures on-going Does not get rid 20-40 mln reads depletion transcription of all rRNA (single or PE) • • Picks up non-coding Messy Dif.Ex. RNA profile polyA selection • • Gives a clean Dif.Ex. Does not pick 5-20 mln reads profile non-coding RNA Alternative for human RNA-seq: AmpliSeq Human Transcriptome panel : • faster, cheaper, works fine with FFPE • input: 50 ng total RNA • dif.ex. ONLY

  27. RNA-seq experimental setup • mRNA only: any kit • mRNA and miRNA: only specialized kits • Always use DNase! • RIN value above 8. • CONTROL vs experimental conditions • Biological replicates: 4 strongly recommended

  28. RNA-seq experimental setup PacBio Iso-seq : full-length transcriptome seq

  29. Amplicon sequencing Used a lot in metagenomics • Community analysis – rRNA genes & spacers (16S, ITS) – Functional genes • Genotyping by sequencing

  30. Amplicon sequencing Example 1: tight peak, OK FOR ANY NGS TECHNOLOGY Size difference among fragments must not exceed 80 bp (or 20% in length) Reason – preferential amplification of short fragments Example 2: several sizes, Example 3: broad peak; fractionation is needed size selection is needed => we HAVE to make several libraries SIZE MATTERS…

  31. Size-related bias in amplicon-seq Courtesy Mikael Brandström Durling, Forest Mycology and Pathology, SLU

  32. When you sequence an amplicon … On MiSeq FW read RW read On Ion FW read

  33. Main types of equipment & applications Illumina HiSeq Ion Torrent PGM NextSeq, X10, MiSeq, Ion Proton PacBio RSII MiniSeq, NovaSeq Ion S5 XL SEQUEL Short paired reads Short single-end reads Ultra-long reads HIGH throughput FAST throughput FAST throughput Human WGS mRNA and miRNA Long amplicons Re-sequencing 30x Exome Re-sequencing mRNA and miRNA ChIP-seq De novo sequencing De novo transcriptome Short amplicons Novel isoform discovery Exome Gene panels Fusion transcript analysis ChIP-seq Clinical samples Haplotype phasing Short amplicons Clinical samples Methylation

  34. Other technologies for scaffolding of genomes 10x Chromium -> Illumina sequencing BioNano Irys, optical mapping

  35. What is “The BEST”?

  36. SAMPLE QUALITY REQUIREMENTS 37

  37. Sample prep: take home message PCR-quality sample and NGS-quality sample are two completely different things

  38. Making an NGS library Sharing & size selection DNA QC – paramount importance Amplification Ligation of sequencing adaptors, technology specific

  39. Library complexity Suboptimal sample Good sample (source: https://www.kapabiosystems.com)

  40. DNA quality requirements Some DNA left in the well Sharp band of 20+kb No sign of proteins No smear of degraded DNA No sign of RNA NanoDrop: Qubit or Picogreen: 260/280 = 1.8 – 2.0 10 kb insert libraries: 3-5 ug 260/230 = 2.0 – 2.2 20 kb insert libraries: 10-20 ug

  41. Example:

Recommend


More recommend