Next Generation Sequencing The basics Wilfred van IJcken Erasmus - PowerPoint PPT Presentation

Center for Biomics Next Generation Sequencing The basics Wilfred van IJcken Erasmus MC Center for Biomics Biomedical Research Techniques (XVIth ed.), Nov 6

Learning objectives Next generation sequencing (NGS): The basics  Background  Illumina sequencing technology  Terminology Next presentation  Research applications  Diagnostic applications  Future directions

What is next generation sequencing?  Sequencing technology developed after Sanger  Millions of reads in parallel (MPS)  Shorter (<400bp) sequencing reads  Enables analysis of complex mixtures of DNA or RNA  Enables genome wide approach  Different vendors with different approaches  MPS = massive parallel sequencing

NGS systems on the market Desktop High Throughput Special Different characteristics Sequencing technology Readlength Speed Output Applications Run cost

Illumina systems  6 Tb per run Data amount HiSeq X Ten NovaSeq6000 HiSeq 4000 HiSeq 2500 Run costs 8 Gb NextSeq 500 Purchase cost MiSeq MiniSeq

NGS flow Intake Isolate Library Sequence Report yield ID DNA or Select chemistry quality RNA enzymes amount region of sex interest Variation disease blood detection plasma PCR signal Match phenotype? saliva capture FFPE cells

DNA library prep

Sequencing by Synthesis cluster generation lane flowcell

Bridge amplification

Sequencing incorporated

Sequencing and basecalling Read 1 A G T C Image acquisition 1 2 3 4 5 6 7 8 9 Base calling C A A G T A A C …

SingIe-end, paired end, index read Index read Single Read GATCG Paired end read Single read = sequence from one side of the fragment Paired end = sequence from both sides of the fragment

Indexing enables sample multiplexing Index Patient 1 GATCG Patient 2 CGTGA ATCGG Patient 3 TCTCT Patient 4 Index = different nucleic acid code per sample  introduced during sampleprep  read during index read Enables multiple samples in one flowcell lane

Sequence Index 1

Sequence Index 2

Sequence Read 2 Image acquisition 1 2 3 4 5 6 7 8 9 C A A G T A A C …

Summary sequencing technology Index 2 Read 2 Read 1 Index 1

Simplified RNA sample preparation DNA RNA Reverse transcriptase Adaptor 1 Adaptor 2

Output file from basecalling  Many file types: qseq, fastq, etc… C A A G T A A C …  Each system own format.  Large file sizes: >400 million reads per lane Instrument PF (0,1) X-coord Y-coord Index # Read # Run ID Lane Tile Sequence ASCII Character Q-score

Data analysis not trivial due to data volumes and complexity Data Volume Total Final Comment HiSeq 2000 200G run Image Data 32 TB 0 Intensity Data 2 TB 0 Optionally transferred 1 byte/base (raw) assuming Base Call / Quality Score Data 0.25 TB 0.25 TB qseq generation offline Alignment Output 6 TB (3 TB) 1.2 TB Remove intermediate files GA IIx 50G run 150 M reads x 8 lanes x 100 bp x 2 (paired end) = 240 Gbp Image Data 6.9 TB 0 Optionally transferred Intensity Data 0.93 TB 0.93 TB Storage and compute needed Base Call / Quality Score Data 0.17 TB 0.17 TB Alignment Output 1.2 TB 1.2 TB Core facilities

Terminology  Next generation sequencing, AKA:  - Deep sequencing  - MPS = massive parallel sequencing Cluster # of sequencing cycles 1 2 3 4 5 6 7 8 9 = readlength T G C T A C G A T … Read

Alignment, Mapping Reference sequence AAAACGCGCTTAGCCTTT T TTCGACTGTCGAGTGGA A CGCCGCTAGCTAGGCGC Heterozygous SNP mismatch Consensus sequence AAAACGCGCTTAGCCTTT T TTCGACTGTCGAGTGGA T CGCCGCTAGCTAGGCGC TAGCCTTT T TTCGACTGTCGAGTGGATCGCCG AGCCTTT T TTCGACTGTCGAGTGGATCGCCGC GCCTTT G TTCGACTGTCGAGTGGATCGCCGCT CCTTT G TTCGACTGTCGAGTGGATCGCCGCTA

Read depth Aka depth of coverage 1 5 7 AAAACGCGCTTAGCCTTT T TTCGACTGTCGAGTGGA T CGCCGCTAGCTAGGCGC TAGCCTTT T TTCGACTGTCGAGTGGATCGCCG AGCCTTT T TTCGACTGTCGAGTGGATCGCCGC GCCTTT G TTCGACTGTCGAGTGGATCGCCGCT CCTTT G TTCGACTGTCGAGTGGATCGCCGCTA GACTGTCGAGTGGATCGCCGCTAGCTAGG CTGTCGAGTGGATCGCCGCTAGCTAGG  Average read depth can differ a lot from read depth !

Accuracy, error rate, quality score  Single base error rate =  Total number of mismatched bases found in mapped sequence reads from a sequencing run, divided by the mappable yield.  Quality scores (Q scores / phred scores)  - derived from an examination of the intensity peaks around each base  - range from 0 – 41, higher corresponds to higher quality  - Q = -10log 10 p, p is basecall error probability Quality score Probability of Base call incorrect base call accuracy 10 (Q10) 1 in 10 90% 20 (Q20) 1 in 100 99% 30 (Q30) 1 in 1000 99.9%

Traditional vs NextGen Sequencing Sanger sequencing: 1 sequence read per basepair NGS: Multiple sequence reads per basepair

Erasmus Center for Biomics Genomics core facility at ErasmusMC www.biomics.nl w.vanijcken@erasmusmc.nl LNA

Next Generation Sequencing The basics Wilfred van IJcken Erasmus - PowerPoint PPT Presentation

Center for Biomics Next Generation Sequencing The basics Wilfred van IJcken Erasmus MC Center for Biomics Biomedical Research Techniques (XVIth ed.), Nov 6 Learning objectives Next generation sequencing (NGS): The basics Background

Genomics Sequencing tech Sequencing tech: next generation What do we get from sequencing? How

Next Next Generation Sequencing: an overview of Generation Sequencing: an overview of

Sequencing technology and assembly Sanger sequencing Sanger sequencing with radioactivity

HIV tropism assessment HIV tropism assessment HIV tropism assessment HIV tropism assessment

Next Generation Sequencing Technologies What is first generation? Sanger Sequencing DNA

Next Generation Sequencing Technologies What is first generation? Sanger Sequencing DNA

Applications of Next Generation DNA Sequencing in Newborn Screening Anne Goodeve Sheffield

Next Generation Sequencing in Molecular Diagnostics Wilfred van IJcken, PhD Erasmus MC Center

The Massive Parallel Sequencing era: "Global sequencing" Richard Christen CNRS UMR

1 Traditional Genome Sequencing Based on the protocol used at JGI (http://www.jgi.doe.gov/) I.

The applicability of next-generation sequencing to native plant materials development Rob

Detecting SNVs with Next-generation-Sequencing Johannes K oster Genome Informatics, University

Introduction to Next-Generation Sequencing Joanna Krupka CRUK Summer School in Bioinformatics

Next generation genomic analysis for next generation healthcare GENOMIC SEQUENCING | RAPIDLY

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen Sequencing Unit (PSU) 2 nd

MULTIMEDIA BASED DATA BASE PRESENTATION TOOL & M . K. Mohandas K. C . Shet ABSTRACT M

How to grade the project? To receive a grade of C: The presentation should be of 10-15 minutes

Detection of viral integration sites Corinna Blasse Advanced Algorithms for Bioinformatics (P4)

Forensic DNA Fingerprinting Lab Tools and Technology Used During Lab p20 micropipette

Sequence of Instruction from Basic to Complex Skills Amiris Dipuglia August 9, 2018 National

What matters in differences between life trajectories? A comparative review of sequence

M onte C arlo D ynamic E vent T ree + MELCOR The MCDET stochastic module is developed at

Group III Base Oils - Whats on the Horizon ? AFPM Conference, Houston, TX November 1-2, 2012

Sambuz

Useful Links

Newsletter

Mail Us

Next Generation Sequencing The basics Wilfred van IJcken Erasmus - PowerPoint PPT Presentation

Center for Biomics Next Generation Sequencing The basics Wilfred van IJcken Erasmus MC Center for Biomics Biomedical Research Techniques (XVIth ed.), Nov 6 Learning objectives Next generation sequencing (NGS): The basics Background

Genomics Sequencing tech Sequencing tech: next generation What do we get from sequencing? How

Next Next Generation Sequencing: an overview of Generation Sequencing: an overview of

Sequencing technology and assembly Sanger sequencing Sanger sequencing with radioactivity

HIV tropism assessment HIV tropism assessment HIV tropism assessment HIV tropism assessment

Next Generation Sequencing Technologies What is first generation? Sanger Sequencing DNA

Next Generation Sequencing Technologies What is first generation? Sanger Sequencing DNA

Applications of Next Generation DNA Sequencing in Newborn Screening Anne Goodeve Sheffield

Next Generation Sequencing in Molecular Diagnostics Wilfred van IJcken, PhD Erasmus MC Center

The Massive Parallel Sequencing era: &quot;Global sequencing&quot; Richard Christen CNRS UMR

1 Traditional Genome Sequencing Based on the protocol used at JGI (http://www.jgi.doe.gov/) I.

The applicability of next-generation sequencing to native plant materials development Rob

Detecting SNVs with Next-generation-Sequencing Johannes K oster Genome Informatics, University

Introduction to Next-Generation Sequencing Joanna Krupka CRUK Summer School in Bioinformatics

Next generation genomic analysis for next generation healthcare GENOMIC SEQUENCING | RAPIDLY

CSE182-L7 CSE182-L7 Protein structure Basics Protein structure Basics Protein sequencing via MS

Apicomplexan Genome Sequencing in Sanger Arnab Pain, The Pathogen Sequencing Unit (PSU) 2 nd

MULTIMEDIA BASED DATA BASE PRESENTATION TOOL &amp; M . K. Mohandas K. C . Shet ABSTRACT M

How to grade the project? To receive a grade of C: The presentation should be of 10-15 minutes

Detection of viral integration sites Corinna Blasse Advanced Algorithms for Bioinformatics (P4)

Forensic DNA Fingerprinting Lab Tools and Technology Used During Lab p20 micropipette

Sequence of Instruction from Basic to Complex Skills Amiris Dipuglia August 9, 2018 National

What matters in differences between life trajectories? A comparative review of sequence

M onte C arlo D ynamic E vent T ree + MELCOR The MCDET stochastic module is developed at

Group III Base Oils - Whats on the Horizon ? AFPM Conference, Houston, TX November 1-2, 2012

Sambuz

Useful Links

Newsletter

Mail Us

The Massive Parallel Sequencing era: "Global sequencing" Richard Christen CNRS UMR

MULTIMEDIA BASED DATA BASE PRESENTATION TOOL & M . K. Mohandas K. C . Shet ABSTRACT M