Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted - PowerPoint PPT Presentation

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node (UGC) Version 6.3

Outline www.robustpm.com • A bit of history • NGS technologies • NGS applications – De Novo – RNA-seq – Targeted enrichment (hybridization & amplicon-Seq) • National Genomics Infrastructure – Sweden • Auxiliary technologies (10x Chromium, BioNano) • Sample prep for NGS

What is sequencing? Phosphate group Fluorofor Proton https://figures.boundless-cdn.com

Once upon a time… • Fredrik Sanger and Alan Coulson Chain Termination Sequencing (1977) Nobel prize 1980 Principle: SYNTHESIS of DNA is randomly TERMINATED at different points Separation of fragments that are 1 nucleotide different in size ! Lack of OH-group at 3’ position of deoxyribose 1 molecule sequenced at a time = 1 read Capillary sequencer: 384 reads per run

2006 REVOLUTION Thousands of molecules sequenced in parallel 1 mln reads sequenced per run Roche 454 GS FLX

Technologies

Differences between platforms • Technology: chemistry + signal detection • Run times vary from hours to days • Production range from Mb to Gb • Accuracy per base from 0.1% to 15% • Cost per base • Library construction Read length: from <100 bp to > 20 Kbp

Read length 1000000 300000 100000 50000 10000 110 600 1 2 3 4 5 6 7

Illumina Instrument Yield and run time Read Error Error Length rate type HiSeq2500 120 Gb – 600 Gb 100x100 0.1% Subst 27h or standard run (250x250) MiSeq 540 Mb – 15 Gb Up to 0.1% Subst (4 – 48 hours) 350x350 HiSeqXten 800 Gb - 1.8 Tb 150x150 “ “ (3 days) Main applications Whole genome, exome and targeted reseq • Transcriptome analyses • Methylome and ChiPSeq • Rapid targeted resequencing (MiSeq) • Human genome seq (Xten) •

Illumina : bridge amplification • 200M fragments per lane • Bridge amplification • Ends with blocking of free 3’-ends and hybridisation of sequencing primer

Illumina : ExAmp = black box Affected platforms : HiSeqXten, HiSeq 3000 and 4000, NovaSeq

Ion Chip Yield - run Read time Length 314, 316, 0.1 – 1 Gb Gb, 200 – 400 318 ( PGM ) 3 hrs bp P-I 10 Gb 200 bp ( Proton ) 4 hrs 520, 530, 1 Gb – 10 Gb 200 - 600 bp 540 ( S5 ) 3 hrs (except 540) Main applications Microbial and metagenomic sequencing • Targeted re-sequencing (gene panels) • Clinical sequencing •

Ion Torrent - H + ion-sensitive field effect transistors bead

PacBio Instrument Yield/cell Read Length Error rate Error type and run time RS II 250 Mb – 1.8 Gb 250 bp – 30 kb 15 % Insertions, random (single pass) 30 - 600 min (78 kb) 0.0001% (circular consensus) SEQUEL 2-6 Gb 250 bp – 25 kb as RSII as RSII 30-600 min Single-Molecule, Real-Time DNA sequencing

PacBio: SMRT - technology SMRT = Single Molecule Real Time

SMRT sequencing: common misconceptions High error rate? Irrelevant, because errors are random Depending on coverage Examples: 8 Mb genome, 8 SNPs detected • 65 kb construct: 100% correct • sequence Detection of low frequency mutations • High price? Bioinfo-time to assemble short reads Not for small genomes Bioinfo-time to assemble Better assembly quality long reads Single-molecule reads without PCR-bias

Oxford Nanopore MinION Reads up to 800k 10-15% error rate Life time 5 days

Main types of equipment PacBio RSII Illumina HiSeq Ion Torrent PGM PacBio Sequel Illumina Xten Ion Proton Illumina MiSeq Ion S5 XL Ultra-long reads Short paired reads Short single-end reads FAST throughput HIGH throughput FAST throughput

Applications

NGS/MPS applications • Whole genome sequencing: – De novo sequencing – Re-sequencing • Transcriptome sequencing: – mRNA-se q – miRNA – Isoform discovery • Target re-sequencing – Exome – Large portions of a genome – Gene panels – Amplicons

De novo sequencing • Used to create a reference genome without previous reference

De novo vs re-sequencing ref De novo Re-seq No bias towards a reference Finding similarities to a reference No template to adapt to Easier to identify SNPs and minor events Fewer contigs Many contigs Novel events are lost Works best for large-scale events

De novo – do it with long reads!

Example: de novo PacBio; Crow Assembly results, FALCON Sequencing results Number of SMRT cells: 70 PRIMARY ALTERNATIVE N50 8.5 Mb 23 kb N75 3.9 Mb 18 kb Total bases per SMRT: 1.39 Gb Nr contigs 4375 2614 Longest contig 36 Mb 121 kb Total reads per SMRT: 106 833 Total length 1.09 Gb 45 Mb

Re-sequencing Population studies: Illumina HiSeq is The Best England and Southern- Scotland Central Sweden Northern Sweden Italy Finland Spain

Transcriptome sequencing (RNA-seq) TOTAL RNA mRNA Splice isoforms Dif.ex. • miRNA Non-codingRNA Annotation • Transcriptional regulation •

mRNA: rRNA depletion vs polyA selection Method Pros Cons Recommended rRNA Captures on-going Does not get rid 20-40 mln reads • • depletion transcription of all rRNA (single or PE) Picks up non-coding Messy Dif.Ex. • • RNA profile polyA selection Gives a clean Dif.Ex. Does not pick 5-20 mln reads • • profile non-coding RNA Alternative for human RNA-seq: AmpliSeq Human Transcriptome panel : • faster, cheaper, works fine with FFPE input: 50 ng total RNA • dif.ex. ONLY •

RNA-seq experimental setup • mRNA only: any kit • mRNA and miRNA: only specialized kits • Always use DNase! • RIN value above 8. • CONTROL vs experimental conditions • Biological replicates: 4 strongly recommended

RNA-seq experimental setup PacBio Iso-seq : full-length transcriptome seq

Targeted re-sequencing Suitable applications Approaches for target-seq - Hybridization capture - Metagenomics (Agilent, NimbleGen, MyBaits) - Resolving complex regions - PCR (Amplicon sequencing) - Low frequency mutations - Long-range - Human re-sequencing - Conventional - Clinical diagnostics - Multiplex - …. - Experimental: - TLA, Samplix, CRISPR-Cas9)

Amplicon sequencing Example 1: tight peak, OK FOR ANY NGS TECHNOLOGY Size difference among fragments must not exceed 80 bp (or 20% in length) Reason – preferential amplification of short fragments Example 2: several sizes, Example 3: broad peak; fractionation is needed size selection is needed => we HAVE to make several libraries SIZE MATTERS…

Size-related bias in amplicon-seq Courtesy Mikael Brandström Durling, Forest Mycology and Pathology, SLU

Amplicon sequencing: Technologies FW read RW read Illumina MiSeq Paired-end reads Single-end reads Ion S5XL PacBio RSII Circular consensus reads

Amplicon sequencing: Barcoding strategies Illumina and Ion PacBio USER NGI

Main types of equipment & applications Illumina HiSeq Ion Torrent PGM NextSeq, X10, MiSeq, Ion Proton PacBio RSII MiniSeq, NovaSeq Ion S5 XL SEQUEL Short paired reads Short single-end reads Ultra-long reads HIGH throughput FAST throughput FAST throughput Human WGS mRNA and miRNA Long amplicons Re-sequencing 30x Exome Re-sequencing mRNA and miRNA ChIP-seq De novo sequencing De novo transcriptome Short amplicons Novel isoform discovery Exome Gene panels Fusion transcript analysis ChIP-seq Clinical samples Haplotype phasing Short amplicons Clinical samples Methylation

But there is more!

10x Genomics (Chromium) Fragment length: 50 kb – 100+ Kb

BioNano Genomics (Irys) Fragment length: 100 kb – 3 Mb

SAMPLE QUALITY REQUIREMENTS 41

Sample prep: take home message PCR-quality sample and NGS-quality sample are two completely different things

Making an NGS library Sharing & size selection DNA QC – paramount importance Amplification Ligation of sequencing adaptors, technology specific

NGS library DNA QC – paramount importance Sharing & size selection

Library complexity Suboptimal sample Good sample (source: https://www.kapabiosystems.com)

DNA quality requirements Some DNA left in the well Sharp band of 20+kb No sign of proteins No smear of degraded DNA No sign of RNA NanoDrop: Qubit or Picogreen: 260/280 = 1.8 – 2.0 10 kb insert libraries: 3-5 ug 260/230 = 2.0 – 2.2 20 kb insert libraries: 10-20 ug

Example:

What do absorption ratios tell us? Pure DNA 260/280: 1.8 – 2.0 < 1.8 : Too little DNA compared to other components of the solution; presence of organic contaminants: proteins and phenol; glycogen - absorb at 280 nm . > 2.0 : High share of RNA. Pure DNA 260/230: 2.0 – 2.2 <2.0 : Salt contamination, humic acids, peptides, aromatic compounds, polyphenols, urea, guanidine, thiocyanates (latter three are common kit components) – absorb at 230 nm . >2.2 : High share of RNA, very high share of phenol, high turbidity, dirty instrument, wrong blank. Photometrically active contaminants: phenol, polyphenols, EDTA, thiocyanate, protein, RNA, nucleotides (fragments below 5 bp)

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted - PowerPoint PPT Presentation

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node (UGC) Version 6.3 Outline www.robustpm.com A bit of history NGS technologies NGS applications De Novo RNA-seq Targeted

Nina Norgren, NBIS Gteborg, May 2019 Slides adapted from: Olga Vinnere Pettersson, PhD

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node

Monitoring and modeling of phytoplankton and marine primary production Lasse H. Pettersson,

Olga Vinnere Pe,ersson, PhD Na3onal Genomics Infrastructure hosted by ScilifeLab, Uppsala Node

INFO MEETING SPEEDGROUP March 6th 2015 Helsinki sa Kinnemar Tomas Pettersson Tomas Pettersson

NLP for Historical (or Very Modern) Text Eva Pettersson eva.pettersson@lingfil.uu.se 2017-08-30

scRNA-seq Differential expression analyses Olga Dethlefsen olga.dethlefsen@nbis.se NBIS,

Emergenta system C-kurs, 5 pong, HT-05 Jonny Pettersson jonny@cs.umu.se 1/11 - 05 Emergent

Statistical Network Analysis Olga Klopp MODALX, Universit e Paris Nanterre - CREST, ENSAE

URINE AND FECES METABOLOMICS-BASED ANALYSIS OF CAROB TREATED RATS Olga Begou 1 , Olga Deda 1 ,

Computing Travelling Flexural-Gravity Waves Olga Trichtchenko ICERM olga.trichtchenko@gmail.com

ON MATRIX D -STABILITY AND RELATED PROPERTIES Olga Kushel Shanghai Jiao Tong University, China

Imagine a world in which every single human being can freely share in the sum of all knowledge.

Event Evaluation & the Event Compas s Robert Pettersson ETOUR, Mid Sweden University Why

Sierk de Jong, Ric Hoefnagels, Elisabeth Wetterlund, Karin Pettersson & Martin Junginger

Numerical methods for FCI B. Despr es+ Part IV X. Blanc LJLL-Paris VI+CEA

A fully well-balanced scheme for the shallow-water model with topography and bottom friction C.

Riemann Problem for Shallow Water Equations with Porosity Stelian ION , Dorin MARINESCU ,

Gravitational waves from first-order phase transitions: Towards understanding ultra-supercooled

Physical Clocks Physical Time Each node in a distributed system has a local clock Runs at an

Identifying Multi-Word Expressions with Recurring Tree Fragments Federico Sangati FBK, Trento

Welcome and Introduc/on Sandra Kbler & Heike Zinsmeister Workshop at Mo/va/on: Digital

Content Project Video Future Research Questions Department of Computer Science

Sambuz

Useful Links

Newsletter

Mail Us

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted - PowerPoint PPT Presentation

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node (UGC) Version 6.3 Outline www.robustpm.com A bit of history NGS technologies NGS applications De Novo RNA-seq Targeted

Nina Norgren, NBIS Gteborg, May 2019 Slides adapted from: Olga Vinnere Pettersson, PhD

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node

Olga Vinnere Pettersson, PhD National Genomics Infrastructure hosted by ScilifeLab, Uppsala Node

Monitoring and modeling of phytoplankton and marine primary production Lasse H. Pettersson,

Olga Vinnere Pe,ersson, PhD Na3onal Genomics Infrastructure hosted by ScilifeLab, Uppsala Node

INFO MEETING SPEEDGROUP March 6th 2015 Helsinki sa Kinnemar Tomas Pettersson Tomas Pettersson

NLP for Historical (or Very Modern) Text Eva Pettersson eva.pettersson@lingfil.uu.se 2017-08-30

scRNA-seq Differential expression analyses Olga Dethlefsen olga.dethlefsen@nbis.se NBIS,

Emergenta system C-kurs, 5 pong, HT-05 Jonny Pettersson jonny@cs.umu.se 1/11 - 05 Emergent

Statistical Network Analysis Olga Klopp MODALX, Universit e Paris Nanterre - CREST, ENSAE

URINE AND FECES METABOLOMICS-BASED ANALYSIS OF CAROB TREATED RATS Olga Begou 1 , Olga Deda 1 ,

Computing Travelling Flexural-Gravity Waves Olga Trichtchenko ICERM olga.trichtchenko@gmail.com

ON MATRIX D -STABILITY AND RELATED PROPERTIES Olga Kushel Shanghai Jiao Tong University, China

Imagine a world in which every single human being can freely share in the sum of all knowledge.

Event Evaluation &amp; the Event Compas s Robert Pettersson ETOUR, Mid Sweden University Why

Sierk de Jong, Ric Hoefnagels, Elisabeth Wetterlund, Karin Pettersson &amp; Martin Junginger

Numerical methods for FCI B. Despr es+ Part IV X. Blanc LJLL-Paris VI+CEA

A fully well-balanced scheme for the shallow-water model with topography and bottom friction C.

Riemann Problem for Shallow Water Equations with Porosity Stelian ION , Dorin MARINESCU ,

Gravitational waves from first-order phase transitions: Towards understanding ultra-supercooled

Physical Clocks Physical Time Each node in a distributed system has a local clock Runs at an

Identifying Multi-Word Expressions with Recurring Tree Fragments Federico Sangati FBK, Trento

Welcome and Introduc/on Sandra Kbler &amp; Heike Zinsmeister Workshop at Mo/va/on: Digital

Content Project Video Future Research Questions Department of Computer Science

Sambuz

Useful Links

Newsletter

Mail Us

Event Evaluation & the Event Compas s Robert Pettersson ETOUR, Mid Sweden University Why

Sierk de Jong, Ric Hoefnagels, Elisabeth Wetterlund, Karin Pettersson & Martin Junginger

Welcome and Introduc/on Sandra Kbler & Heike Zinsmeister Workshop at Mo/va/on: Digital