Olga Vinnere Pe,ersson, PhD Na3onal Genomics Infrastructure hosted by ScilifeLab, Uppsala Node (UGC) Version 6.1
Outline: www.robustpm.com • 4 slides about history • NGS technologies • NGS applica3ons • NGS sample quality requirements • Philosophical reflec3on • Na3onal Genomics Infrastructure – Sweden
Once upon a 3me… • Fredrik Sanger and Alan Coulson Chain Termina3on Sequencing (1977) Nobel prize 1980 Principle: SYNTHESIS of DNA is randomly TERMINATED at different points Separa3on of fragments that are 1 nucleo3de different in size
Sequencing genomes using Sanger ’s method • Extract & purify genomic DNA • Fragmenta3on • Make a clone library • Sequence clones • Align sequencies ( -> con3gs -> scaffolds) • Close the gaps • Cost/Mb=1000 $, and it takes TIME
DNA sequencing revolu3on - Sweden Massively parallel sequencing (454, Illumina, Life Tech) Human genome James Watsons genome Center for Metagenomic Sequence Analysis (KAW) Na3onal Genomics Infrastructure (NGI) Science for Life Laboratory (SciLifeLab)
Workload at NGI – Sweden 2010-2014 samples projects 8000 250 7000 200 6000 5000 150 4000 Samples Projects 100 3000 2000 50 1000 0 0 Q3-10 Q4-10 Q1-11 Q2-11 Q3-11 Q4-11 Q1-12 Q2-12 Q3-12 Q4-12 Q1-13 Q2-13 Q3-13 Q4-13 Q1-14 Q2-14 Q3-14
NGS technologies Company Platform Amplification Sequencing method Roche 454 (until 2016) emPCR Pyrosequencing Illumina HiSeq, MiSeq Bridge PCR Synthesis NextSeq, X10 LifeTechnologies Ion Torrent, emPCR Synthesis (pH) (Thermo Fisher) Ion Proton, S5 Pacific RSII None Synthesis Biosciences (SMRT) Complete Nanoballs None Ligation genomics Oxford MinION None Flow Nanopore* GridION RIP technologies: Helicos, Polonator, SOLiD, 454 etc. In development: Tunneling currents, nanopores, etc.
Differences between plaoorms • Technology: chemistry + signal detec3on • Run 3mes vary from hours to days • Produc3on range from Mb to Gb • Read length from <100 bp to > 20 Kbp • Accuracy per base from 0.1% to 15% • Cost per base
Illumina Instrument Yield and run time Read Error Error Length rate type HiSeq2500 120 Gb – 600 Gb 100x100 0.1% Subst 27h or standard run (250x250) MiSeq 540 Mb – 15 Gb Up to 0.1% Subst (4 – 48 hours) 350x350 HiSeqXten 800 Gb - 1.8 Tb 150x150 “ “ (3 days) Main applica?ons Whole genome, exome and targeted reseq • Transcriptome analyses • Methylome and ChiPSeq • Rapid targeted resequencing (MiSeq) • Human genome seq (Xten) •
Illumina
Life Technologies - Ion Torrent & Ion Proton Chip Yield - run Read time Length 314, 316, 0.1 – 1 Gb Gb, 200 – 400 318 ( PGM ) 3 hrs bp P-I ( Proton ) 10 Gb 200 bp 4 hrs 520, 530, 1 Gb – 10 Gb 400 bp 540 ( S5 ) 3 hrs (except 540) Main applica?ons Microbial and metagenomic sequencing • Targeted re-sequencing (gene panels) • Clinical sequencing •
Ion Torrent - H + ion-sensi3ve field effect transistors
314 chip 316 chip 318 chip PI chip 10 Mb 100 Mb 1 Gb 10 Gb S5 200 – 400 bp 200 bp virus, bacteria, small eukaryote eukaryote
PacBio SMRT-technology Instrument Yield and run time Read Error rate Error Length type RS II 250 Mb – 1.3 Gb / 250 bp – 15% Insertions, 30 - 240 min 30 000 bp random (on a single passage!) SMRTCell (70 000 bp) Single-Molecule, Real-Time DNA sequencing
PacBio SMRT - technology Single Molecule Real Time
Oxford Nanopore MinION Reads up to 100k 1D and 2D reads 15-40% error rate Life 3me 5 days
Main types of equipment PacBio RSII Illumina HiSeq Ion Torrent PGM Illumina Xten Ion Proton Illumina MiSeq Ion S5 XL Ultra-long reads Short paired reads Short single-end reads FAST throughput HIGH throughput FAST throughput
NGS/MPS applica3ons • Whole genome sequencing: – De novo sequencing – Re-sequencing • Transcriptome sequencing: – mRNA-seq – miRNA – Isoform discovery • Target re-sequencing – Exome – Large por3ons of a genome – Gene panels – Amplicons
De novo sequencing • Used to create a reference genome without previous reference
De novo sequencing: Illumina strategy PacBio strategy Sequencing: Sequencing: • PE library with 350 bp • 10-20 kb library • PE library with 600 bp 50-80x • MP library with 2 kb (where 30x are reads above 10 kb) • MP library with 5-8-20 kb PE: 50-100x, MP 10-15x Analysis: Analysis: • ALLPATH • HGAP (haploid) • FALCON (diploid)
Transcriptome sequencing (RNA-seq) TOTAL RNA mRNA Splice isoforms • Dif.ex. miRNA Non-codingRNA • Annota3on • Transcrip3onal regula3on
mRNA: rRNA deple3on vs polyA selec3on Method Pros Cons Recommended rRNA deple3on • Captures on-going • Does not get rid 20-40 mln reads transcrip3on of all rRNA (single or PE) • Picks up non-coding • Messy Dif.Ex. RNA profile polyA selec3on • Gives a clean Dif.Ex. • Does not pick 5-20 mln reads profile non-coding RNA Alterna3ve for human RNA-seq: AmpliSeq Human Transcriptome panel : • faster, cheaper, works fine with FFPE • input: 50 ng total RNA • dif.ex. ONLY
RNA-seq Equipment-related bias • De novo transcriptome: Illumina PE only • RNA-seq with a good reference: – Illumina 50 bp single end for Dif. Ex. – Illumina PE for splice informa3on – Ion Proton single end in both cases miRNA: Illumina or IonProton, but s3ck to the same technology through the project!
RNA-seq experimental setup • mRNA only: any kit • mRNA and miRNA: only specialized kits • Always use DNase! • RIN value above 8. • CONTROL vs experimental condi3ons • Biological replicates: 4 strongly recommended
Amplicon sequencing Used a lot in metagenomics • rRNA genes & spacers (16S, ITS) • Func3onal genes • Genotyping by sequencing
Amplicon sequencing Example 1: 3ght peak, OK FOR ANY NGS TECHNOLOGY Size difference among fragments must not exceed 80 bp (or 20% in length) Reason – preferen3al amplifica3on of short fragments Example 2: several sizes, Example 3: broad peak; frac3ona3on is needed size selec3on is needed => we HAVE to make several libraries SIZE MATTERS…
Size-related bias in amplicon-seq Courtesy Mikael Brandström Durling, Forest Mycology and Pathology, SLU
When you sequence an amplicon… On MiSeq FW read RW read On Ion FW read
Sequence capture When you are not interested in the en3re genome: • Exome • Regions of interest • Genes of interest (gene panels) Hybridiza3on-based capture PCR-based capture
Sequence capture: technology choice • AmpliSeq panels (mul3plex PCR) – Ion Only • Comprehensive Cancer panel • Cancer Hotspot panel • AmpliSeq Human Exome, etc • AmpliSeq Human Transcriptome • Hybridiza3on-based: any technology • Non-mul3plex PCR – any technology – Short reads (up to 500 bp) – Illumina – Medium reads (up to 500 bp) – Ion – Long reads (from 500 bp – 20 kb) - PacBio
Main types of equipment & applica3ons Illumina HiSeq Ion Torrent PGM Illumina Xten Ion Proton PacBio RSII Illumina MiSeq Ion S5 XL Short paired reads Short single-end reads Ultra-long reads HIGH throughput FAST throughput FAST throughput Human WGS mRNA and miRNA Long amplicons mRNA and miRNA Exome Re-sequencing De novo transcriptome ChIP-seq De novo sequencing Exome Short amplicons Novel isoform discovery ChIP-seq Gene panels Fusion transcript analysis Short amplicons Clinical samples Haplotype phasing Methyla3on Clinical samples
SAMPLE QUALITY REQUIREMENTS 34
Making an NGS library Sharing & size selec3on DNA QC – paramount importance Amplifica3on Liga3on of sequencing adaptors, technology specific
Garbage in – garbage out: sequencing success to 90% depends on the sample quality Before samples are submi\ed: Send us the gel picture (DNA) 260/280 and 260/230 readings (DNA) BioAnalyzer readings (RNA)
Reading gel pictures of genomic DNA Protein contamina?on RNA contamina?on - Apply phenol-chloroform - Apply RNase, followed by phenol-chloroform extrac3on If unsure, make dilu3on series. Phenol carry-over or If problem persists – try MoBio clean-up kit, overloaded sample? or re-extract DNA
What do absorp3on ra3os tell us? Pure DNA 260/280: 1.8 – 2.0 < 1.8 : Too li,le DNA compared to other components of the solu3on; presence of organic contaminants: proteins and phenol; glycogen - absorb at 280 nm . > 2.0 : High share of RNA. Pure DNA 260/230: 2.0 – 2.2 <2.0 : Salt contamina3on, humic acids, pep3des, aroma3c compounds, polyphenols, urea, guanidine, thiocyanates (la,er three are common kit components) – absorb at 230 nm . >2.2 : High share of RNA, very high share of phenol, high turbidity, dirty instrument, wrong blank. Photometrically acCve contaminants: phenol, polyphenols, EDTA, thiocyanate, protein, RNA, nucleoCdes (fragments below 5 bp)
How to make a correct measurement Low concentra3on DNA solu3on • Thaw DNA completely High concentra3on • Mix gently ( never vortex! ) • Put the sample on a thermoblock: 37°C, 15-30 min • Mix gently • Dilute 1:100 (if HMW) • Mix gently • Make a measurement with an appropriate blank • NANODROP is Bad .
Recommend
More recommend