next generation sequencing in molecular diagnostics
play

Next Generation Sequencing in Molecular Diagnostics Wilfred van - PowerPoint PPT Presentation

Center for Biomics Next Generation Sequencing in Molecular Diagnostics Wilfred van IJcken, PhD Erasmus MC Center for Biomics Nov 2 2017 Molecular Diagnostics Course XI Learning objectives Next generation sequencing (NGS): The basics


  1. Center for Biomics Next Generation Sequencing in Molecular Diagnostics Wilfred van IJcken, PhD Erasmus MC Center for Biomics Nov 2 2017 Molecular Diagnostics Course XI

  2. Learning objectives Next generation sequencing (NGS):  The basics  Illumina sequencing technology  Terminology  Enrichment technology Clinical applications  Targeted gene panels vs exome vs whole genome  NIPT  Future directions

  3. Next next next generation sequencing…  1st generation sequencing technique: amplified multiple molecule seq  Sanger sequencing  2nd generation sequencing techniques: amplified single molecule seq  454 sequencing - Roche  SBS sequencing - Illumina  Solid sequencing - Applied biosystems/Life technologies  Ion Torrent - Life technologies  3rd generation sequencing techniques: Single molecule seq  Helicos tSMS  PacBio SMRT (real time DNA seq)  NanoPore Technologies

  4. NGS systems on the market Desktop High Throughput Special

  5. Sequence technology dynamics Desktop High Throughput Special

  6. What is next generation sequencing?  Sequencing technology developed after Sanger  Millions of reads in parallel (MPS)  Shorter (<400bp) sequencing reads  Enables analysis of complex mixtures of DNA or RNA  Enables genome wide approach  Different vendors with different approaches  MPS = massive parallel sequencing

  7. NGS flow Intake Isolate Library Sequence Report yield ID DNA or Select chemistry quality RNA enzymes amount region of sex interest Variation blood detection disease plasma PCR signal Match phenotype? saliva capture FFPE cells

  8. Illumina systems  6 Tb per run Data amount HiSeq X Ten NovaSeq6000 HiSeq 4000 HiSeq 2500 Run costs 8 Gb NextSeq 500 Purchase cost MiSeq MiniSeq

  9. Simplified sample preparation DNA RNA Reverse transcriptase Adaptor 1 Adaptor 2

  10. Bridge amplification lane each DNA molecule hybridizes at different location in flowcell lane

  11. Clustering and Sequencing 3’ 5’ A G T C G A C T T A C C G G A T A A C T C each base has a C G C G different fluorescent A T dye coupled T C G A T Cluster growth 5’ Sequencing 1 2 3 4 5 6 7 8 9 T G C T A C G A T … Base calling Image acquisition

  12. Output file from basecalling  Many file types: qseq, fastq, etc…  Each system own format.  Large file sizes: ~150 million reads per lane Instrument PF (0,1) X-coord Y-coord Index # Read # Run ID Lane Tile Sequence ASCII Character Q-score

  13. Data analysis not trivial due to data volumes and complexity Data Volume Total Final Comment HiSeq 2000 200G run Image Data 32 TB 0 Intensity Data 2 TB 0 Optionally transferred 1 byte/base (raw) assuming Base Call / Quality Score Data 0.25 TB 0.25 TB qseq generation offline Alignment Output 6 TB (3 TB) 1.2 TB Remove intermediate files GA IIx 50G run Need data storage and compute Image Data 6.9 TB 0 Optionally transferred to handle up to penta bytes of data Intensity Data 0.93 TB 0.93 TB Base Call / Quality Score Data 0.17 TB 0.17 TB Core facilities needed Alignment Output 1.2 TB 1.2 TB

  14. Terminology  Next generation sequencing, AKA:  - Deep sequencing  - MPS = massive parallel sequencing Cluster # of sequencing cycles 1 2 3 4 5 6 7 8 9 = readlength Read T G C T A C G A T …

  15. SingIe-end, paired end, index read Index read Single Read GATCG Paired end read Single read = sequence from one side of the fragment Paired end = sequence from both sides of the fragment

  16. Indexing enables sample multiplexing Index Patient 1 GATCG Patient 2 CGTGA ATCGG Patient 3 TCTCT Patient 4 Index = different nucleic acid code per sample  introduced during sampleprep  read during index read Enables multiple samples in one flowcell lane

  17. Alignment, Mapping Reference sequence AAAACGCGCTTAGCCTTT T TTCGACTGTCGAGTGGA A CGCCGCTAGCTAGGCGC Heterozygous SNP mismatch Consensus sequence AAAACGCGCTTAGCCTTT T TTCGACTGTCGAGTGGA T CGCCGCTAGCTAGGCGC TAGCCTTT T TTCGACTGTCGAGTGGATCGCCG AGCCTTT T TTCGACTGTCGAGTGGATCGCCGC GCCTTT G TTCGACTGTCGAGTGGATCGCCGCT CCTTT G TTCGACTGTCGAGTGGATCGCCGCTA

  18. Read depth Aka depth of coverage 1 5 7 AAAACGCGCTTAGCCTTT T TTCGACTGTCGAGTGGA T CGCCGCTAGCTAGGCGC TAGCCTTT T TTCGACTGTCGAGTGGATCGCCG AGCCTTT T TTCGACTGTCGAGTGGATCGCCGC GCCTTT G TTCGACTGTCGAGTGGATCGCCGCT CCTTT G TTCGACTGTCGAGTGGATCGCCGCTA GACTGTCGAGTGGATCGCCGCTAGCTAGG CTGTCGAGTGGATCGCCGCTAGCTAGG  Average read depth can differ a lot from read depth !

  19. Accuracy, error rate, quality score  Single base error rate =  Total number of mismatched bases found in mapped sequence reads from a sequencing run, divided by the mappable yield  Quality scores (Q scores / phred scores)  - derived from an examination of the intensity peaks around each base  - range from 0 – 41, higher corresponds to higher quality  - Q = -10log 10 p, p is basecall error probability Quality score Probability of Base call incorrect base call accuracy 10 (Q10) 1 in 10 90% 20 (Q20) 1 in 100 99% 30 (Q30) 1 in 1000 99.9%

  20. NGS systems on the market Desktop High Throughput Special Different characteristics Sequencing technology Readlength Speed Output Applications Run cost

  21. NGS Applications whole genome De novo sequencing Epigenetic profiling (DNA methylation) Gene expression analysis Discovery of novel transcripts, splice variants, miRNAs Protein-DNA/RNA interactions (ChIPSeq) genomic DNA interactions (3C, 4C, 5C Seq) Targeted DNA sequencing Exome Sequencing Clinical use Whole genome re-sequencing

  22. Diagnostic applications  Targetted sequencing Cardio Myopathies, Ciliopathies, Cancer hotspot panel, Noonan, Neurodegenerative diseases, …  Exome sequencing Unknown disease, de novo  Whole genome sequencing Unknown disease, non-exonic  Non invasive diagnostics prenatal plasma, T21 testing (NIPT)  Cancer sequencing germline mutations, therapy  HLA typing transplantation

  23. Enrichment technology Exome = all coding regions (~ exons) of genome

  24. Choose your baits  Agilent, Nimblegen (Roche), Illumina, IDT, … exome, panel or other targets CRE: boosted coverage for ~5000 clinically relevant genes CRE halo V4  Exome performance  Target coverage >20X coverage for 95% of genes  Even coverage read depth distribution  Specificity of capture gene False pos / neg variants High homology genes

  25. Exome data analysis overview  Mapping %, on/off target Mapping  % >20x, min, max, bases not sequenced Coverage  bases <20x add Sanger amplicons Sanger +  low frequency variants + indel Variants + GATK: SNP + indel Filtering Annotation >100 databases, function Copy  Exome depth number  Dominant, recessive, etc Inheritance

  26. Quality  High throughput  ISO 15189/17025 accredition needed for clinical use in NL  Sample swap is a real possibility  Spike-in to uniquely identify each sample after sequencing Spike-in Sequencing Shear Capture A1 QC QC B1 C1

  27. How does targetted sequencing result look?

  28. Zoom in sequence result

  29. Variation is not only SNP Structural variants (SVs), Short InDels SNPs [e.g. kb-Mb-sized deletions, insertions, inversions, fusion genes] GATTTAGATCGCGATAGAG GATT------------GAG GATTTAGATCTCGATAGAG GATTTAGATCTCGATAGAG More difficult to detect than SNPs ~0.1% of the genomes of any presumably >0.1% of the genome two individuals differ due to SNPs

  30. Recent Case report 2005: 5 weeks old girl hospitalized RS virus with artificial respiration 2008: Developmental delay maybe due to braindamage by hypoxia 2011: Re-evaluation clinical geneticist: possibly Sotos syndrome SNParray, Sanger NSD1, PTEN, AOA, fraX, metabolism: Negative 2015: Re-evaluation: speech affected. WES trio filter for ID genes de novo c.1216C>T, p.Gln406* mutation MECP2 -> atypical form of RETT syndrome 2016: RETT specialist: 5 other girls found with atypical RETT syndrome with c-terminal frame shift mutations in MECP2 (unpublished) WES helps to solve previously unsolved cases Evidence increasing to use WES as first tier care

  31. Human and disease, what to sequence? • Most mendelian diseases are caused by exome mutations • Exome is only ~1.6 % of human genome (50Mbp) Panel Exome Whole genome Genome >0,01% 1,6 % 95 % Sequencing 1/400x 1x 60x Interpretation ++ + + / - Validation ++ + + / - Speed ++ + - Cost (est.) € 500 € 700 € 3000

  32. Whole genome sequencing X Ten Outsource ? $1000 genome $1000 genome 30x 40x

  33. Comparision of exome and genome sequencing

  34. Non invasive trisomy testing (NIPT) 10 weeks pregnancy 5% fetal DNA DNA isolation Prepare NGS Analysis Trisomy Report

  35. NIPT: determine fetal chromosomal copy number Fetal cfDNA Maternal cfDNA Chr 21 Chr 21 Euploid Fetal Pregnancy Trisomy

  36. Future of NGS

  37. MinION  USB sized sequencer  One time use  $ 900 dollar  500 nanopores  > 1 Gbp  User defined runtime  Lifetime electrodes is limiting (days) No sample prep Measure directly from blood

Recommend


More recommend