ngs ii illumina sequencing
play

NGS II Illumina Sequencing Robert Kraaij Department of Internal - PowerPoint PPT Presentation

DepthOfCoverage Genetics for Dummies 2017 NGS II Illumina Sequencing Robert Kraaij Department of Internal Medicine r.kraaij@erasmusmc.nl Overview Data Analysis Applications Example: Exome Sequencing Things to be addressed


  1. DepthOfCoverage Genetics for Dummies 2017 NGS II – Illumina Sequencing Robert Kraaij Department of Internal Medicine r.kraaij@erasmusmc.nl

  2. Overview • Data Analysis • Applications • Example: Exome Sequencing

  3. Things to be addressed NGS: many short reads that might contain errors data analysis will handle these reads and errors

  4. Overview • Data Analysis • Applications • Example: Exome Sequencing

  5. Illumina Sequencing bridgePCR cBot flowcell HiSeq2000

  6. Per Cycle Imaging

  7. Per Cycle Imaging G A T C

  8. Per Cycle Base Calling G G good quality poor quality

  9. Quality Scoring Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000 99.999 % 0 to 93  ASCII 33 to 126 = single character

  10. FASTQ File @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTC +SEQ_ID !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>

  11. Alignment or Mapping of Reads R E F E R E N C E G E N O M E (HG19) G A T T A C G G T A C T T G C A T A G C T T A C G G T A C T T G C A T A chromosome + position + strand sample.bam

  12. Run QC and filtering sample.bam

  13. sortedBAM file • both reads • quality scores • chromosome • position • quality flag • duplicate flag sample.bam • off target flag

  14. Coverage T T A C G G T A C T T G C A T G G T A C T T G C A T A G C T G A T T A C G G T A C T T G C A C G G T A C T T G C A T A G T A C G G T A C T T G C A T A G A T T A C G G T A C T T G C A T A G C T 5x coverage

  15. Mean Coverage bases on target size of target

  16. % of Bases Above a Certain Threshold T T A C G G T A C T T G C A T G G T A C T T G C A T A G C T G A T T A C G G T A C T T G C A C G G T A C T T G C A T A G T A C G G T A C T T G C A T A G A T T A C G G T A C T T G C A T A G C T 1x 5x 5x 4x

  17. Variant Calling A T T A C G G T G C T T G C A C G G T G C T T G C A T A G C G A T T A C G G T G C T G C A T A G C T - T T A C G G T G C T T G C A T G G T G C T T G C A T A G C T G A T T A C G G T G C T T G C A C G G T G C T T G C A T A G T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T G = homozygous alternative

  18. Variant Calling A T T A C G G T G C T T G C A C G G T G C T T G C A T A G C G A T T A C G G T A C T G C A T A G C T - T T A C G G T A C T T G C A T G G T G C T T G C A T A G C T G A T T A C G G T A C T T G C A C G G T G C T T G C A T A G T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A/G = heterozygous

  19. Variant Calling G A T T A C G G T A C T T G C A C G G T G C T T G C A T A G T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T A/G = heterozygous?

  20. Variant Calling sequencing quality poor good G A T T A C G G T A C T T G C A C G G T G C T T G C A T A G T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T G

  21. VCF File • chromosome • position • quality • annotations sample.vcf

  22. Variant Calling G A T T A C G G T G C T T G C A C G G T G C T T G C A T A G C G A T T A C G G T A C T G C A T A G C T - G A T T A C G G T A C T T G C A T G G T G C T T G C A T A G C T G A T T A C G G T A C T T G C A C G G T G C T T G C A T A G T A G T A C G G T G C T T G C A T A G A T T A C G G T A C T T G C A T A G C T deletion = heterozygous

  23. Paired-End Sequencing 2 x 100 bp

  24. Variant Calling: Mate Pairs 400 bp normal 800 bp deletion 200 bp insertion

  25. Variant Calling: Mate Pairs 400 bp normal translocation

  26. Variant Calling: Split Reads 800 bp genome mRNA (cDNA)

  27. Overview • Data Analysis • Applications • Example: Exome Sequencing

  28. Applications • Re-sequencing  full genome  SNPs and indels • Re-sequencing  mate pairs  structural variations • Re-sequencing  regional  SNPs and indels • Sequencing  de novo assembly • RNAseq • ChIPseq • …seq

  29. www.illumina.com

  30. Example: Exome Sequencing

  31. Exome Sequencing funding by NGI-NCHA, NWO, BBMRI n > 3,000 samples of random set from RS-I start May 2011; Nimblegen part of “CHARGE - S” effort: >5,000 exomes across 4 cohorts CHARGE Framingham, CHS, ARIC, Rotterdam Study Expand with exome variants array?

  32. Exome vs Full Genome exon exon exon genome  3 Gb exome  ~30 Mb

  33. Exome Sequencing Workflow Library Exome Data DNA Sequencing preparation capture analysis isolation

  34. Exome + capture +

  35. Nimblegen SeqCap EZ v2 Capture • CCDS (Sept 2009) • miRBase (v14, Sept 2009) • RefSeq (Jan 2010) • 2,100,000 probes • 30,246 coding genes • 329,028 exons • 710 miRNAs • 36.5 Mb primary target • 44.1 Mb capture target

  36. Illumina TruSeq V3 2x100 PE Sequencing

  37. Data analysis: BWA-GATK pipeline Alignment Variant-Calling • BclToFastQ • BaseQualityScore • ANNOVAR, (CASAVA) Recalibration, VCFtools • BWA (paired) • HaplotypeCaller • Chastity Filter IndelRealignment • PlinkSeq, SKAT, • SortSam, • VQSR (GATK) R MarkDuplicates • VarEval • Spotfire (picard) Demultiplexing Processing Analysis

  38. Sample QC and Variant QC

  39. RSX-2 Samples were sequenced to ~54x Mean Coverage Percentage of 44Mb covered 10x or better Average Mean Depth of Coverage across the 44Mb SeqCap Exome

  40. Mean Depth of Coverage by Flowcell Mean Depth of Coverage Flowcell Number (Roughly Chronological Order)

  41. Freemix Values by Flowcell Estimated Freemix Values Flowcell Number (Roughly Chronological Order)

  42. Determing Heterozygous Concordance versus 550k genotyping arrays Heterozygous Concordance Flowcell Number (Roughly Chronological Order)

  43. Comparing Concordance versus Freemix reveals cutoff around 13% correction Heterozygous Concordance Estimated Freemix Values

  44. Sample QC and Variant QC

  45. Number of Detected SNPs per Samples by Flowcell Flowcell Number (Roughly Chronological Order)

  46. Heterozygous to Homozygous ratio per Sample by Flowcell Flowcell Number (Roughly Chronological Order)

  47. Transition to Transversion Ratio transition purines transversion pyrimidines

  48. Transition to Transversion Ratio per Sample by Flowcell Flowcell Number (Roughly Chronological Order)

  49. QC and filtering results

  50. Things to Remember NGS: many short reads that might contain errors coverage indicates the number of independent reads that cover a base  needed to analyse a genome FASTQ file  sequence + quality scores BAM file  aligned reads VCF file  called variants + annotation

Recommend


More recommend