analysis of structural genome variation in whole genome
play

Analysis of structural genome variation in whole genome and exome - PowerPoint PPT Presentation

Analysis of structural genome variation in whole genome and exome sequencing data Victor Guryev November 13, 2018 15th SNPs and human diseases course Erasmus MC, Rotterdam Our genomes: base and structural variants /a /g NGS: how do we


  1. Analysis of structural genome variation in whole genome and exome sequencing data Victor Guryev November 13, 2018 15th SNP’s and human diseases course Erasmus MC, Rotterdam

  2. Our genomes: base and structural variants /a /g

  3. NGS: how do we get our genomes?

  4. 1000 genomes project (1kG) Low coverage whole genome and deep exome sequencing of 2,500 individuals to discover 95% of variants at 1% frequency Small variants : The 1000 Genomes Project Consortium, 2015. Nature 526:68-74 Structural variants : Sudmant et al, 2015. Nature 526:75-81

  5. Genome of the Netherlands (GoNL) Position paper : Boomsma et al, 2013 1000 G GoNL Small variants : Francioli et al, 2014 Structural variants : Hehir-Kwa et al, 2016 DNA source Cell lines Blood 500 bp Coverage 3-4x >12x Median base Data generation Mult. platforms BGI/Illumina 90 bp coverage: 12x Population Multiple, Dutch only, unrelated trios, twins Phenotype info None Multiple

  6. SV classes and detection methods Structural Genome Variations (SVs) ABCD Copy-number variants Copy-balanced variants Inversion Deletion Duplication Translocation ADCB ABD ABCCCD AB CD aCGH Di-tag fosmid and NGS sequencing Fibre-FISH

  7. Method 1: Read depth analysis (RD) Expected distribution of tags Scope: Copy-number changes W R Average coverage: 5 WGS /site Distribution over duplicated site Tool examples: CNV-Seq (Xie &Tammi 2009) W CNVnator (Abyzov et al, R 2011) 5 WGS/site 10 WGS/site 5 WGS/site SegSeq (Chiang et al, 2009) DWAC-Seq (our tool)

  8. Method 2: Discordant pairs (DP) reference sequenced /mapped Normal Inversion Tandem duplication Insertion Deletion Translocation Chr 7 Chr 5 Scope : copy-number and copy-neutral SV at resolution close to base-pair Tool exampless: Breakdancer (Chen et al, 2009); 123SV (our tool)

  9. Method 3: Split-read mapping (SR) Scope : prediction of copy-number and copy- neutral SV at nucleotide resolution Tool examples: Pindel (Ye et al, 2009) SRiC (Zhang et al, 2011) Evidence from multiple reads Advantage of paired reads Anchor Split read Unmapped reads are good candidates for split-mapping

  10. Method 4: Genome assembly (AS) Scope : various types of SVs including large inserts Tool examples : de novo assemblers SOAPdenovo, ABYSS, Allpaths-LG BLAST/BLAT/BWASW search for comparison of contigs and genome reference Imperfect alignment Ref Contig

  11. Method applicability: base and physical coverage chromosome Base coverage: ~ 1x; Physical coverage ~ 4x Approach Base coverage Physical coverage Depth of coverage ! Discordant pairs ! Split-mapping ! De novo assembly ! !

  12. Multi-method approaches to SV discovery PINDEL (http://gmt.genome.wustl.edu/packages/pindel/) Split-read mapping (very specific for short and mid-size variants) DELLY (https://github.com/dellytools/delly) Discordant read and split-read methods LUMPY-SV (https://github.com/arq5x/lumpy-sv) Multi-method tool SURVIVOR, MetaSV, Parliament � creating consensus or multi-sample callset Parliament2 � run multiple tools (Breakdancer, BreakSeq, CNVnator, Delly, Lumpy, Manta) and create consensus callset Also available as docker container

  13. GoNL pipeline for SV discovery

  14. GoNL SV detection [Hehir-Kwa et al., 2016]

  15. Genome sequencing: what do we get? GoNL variant list SNPs 20.4 M Short indels 1-20 bp 1.7M Deletions 20-99 bp 31.5k Deletions 100+ bp 20k Mobile Element Insertions 13k Insertions 2,2k Duplications 1,8k Inversions 90 Interchromosomal events 60 Per individual genome (compared to reference genome) 3.7M SNPs 360k short indels (1-20bp) 5.2k medium deletions ( 20 – 100 bp) 3.3k large deletions ( 100+ bp)

  16. Impact of Structural Variants GoNL: Bases affected Variant type Megabases SNVs 20.4 SNVs Indels 4.3 SVs 75.3 Indels Structural variants

  17. Alu Ya4 insertion in PRAMEF4 gene Alu Ya4 Alu Ya4 PRAME Family member 4 In constitutive exon Observed in 21 samples Mutations in gene are associated with melanoma [Hehir-Kwa et al., 2016]

  18. Complex variants: gene retrotransposition insertion polymorphism (GRIP) Chr15: 40.85Mb Chr7: 26.24 Mb to chr7 ------------------------------deletion------------------------ Chr15: 40.85Mb 1 210 Chr7:26.24Mb 534 to chr15 Prevalence : GoNL about 40 cases Mechanism : (retro)transposition Tools : Discordant pairs (1-2-3-SV) [Hehir-Kwa et al., 2016]

  19. “Knock-outs” in our genome Chr13 transcription splicing reverse transcription integration Chr11

  20. Complex variants: MNPs, complex indels Mechanism : polymerase errors Tool example : GATK Haplotype Caller Prevalence : ~3% of all indels are non-simple [Hehir-Kwa et al., 2016]

  21. Complex variants: Non-allelic conversion Father Mother Child Mechanism : gene conversion Tool example : assembly, discordant pairs Prevalence : currently only several cases

  22. Complex variants KRAB box domain containing 4, aka ZNF673, transcription regulator [Hehir-Kwa et al., 2016]

  23. New genomic segments

  24. New segments

  25. New segments: example Allele frequency in GoNL: 28%. 50% of Dutch population have it as

  26. Change in expression level

  27. Change in transcript structure

  28. Complex variants: Chromothripsis 57519917 57521100 57523787 55793180 chr10 57521088 55793182 57523805 57519913 55792170 57524597 chr1 50761470 50761463 105745953 102287386 chr4 105025700 105036708 102287791 104738996 105745783 105028395 102287798 105035150 105745828 104738136 105029770 105036735 105028400 =DNA double strand breaks

  29. How dynamic our genomes are? x 250 1,169 de novo candidate indels Sized 1-20 bp; 99 children 601 de novo candidate SVs Sized 20+ bp; 250 families (258 non-identical children) Validation by PCR, sequencing 291 de novo indels 41 de novo SVs • 203 small deletions • 27 deletions • 74 insertions • 8 duplications • 14 complex indels • 5 Alu insertions • 1 complex event Genome Res (2015) 25:792–801

  30. De novo SVs: size distribution

  31. De novo mutations : parental and familial bias Non-uniform distribution of SVs, p = 0.0074 Indels SVs

  32. What about targeted re-sequencing? WGS Father WGS Mother WGS Child WES Father WES Mother WES Child • Same methodologies are applicable for WES • RD analysis: need additional correction to account for variation in enrichment • Very limited sensitivity if SV breakpoint is outside of enriched area Tool examplea : GATK HaplotypeCaller, CONIFER, ExomeCNV

  33. Catching SVs from targeted sequencing Del Father, WGS Mother, WGS Child, WGS Father, WES Mother, WES Child, WES Gene annotation

  34. Not-catching SVs with targeted sequencing Heterozygous deletion in Father inherited to Child Father, WGS Mother, WGS Child, WGS Father, WES Mother, WES Child, WES Gene annotation

  35. SV imputation

  36. SV imputation (2)

  37. SV imputation (3)

  38. PacBio and OxNano: true long reads

  39. “Synthetic” long reads: 10x Chromium linked-reads

  40. Take home message: importance of SVs Variant Human Common Rare Individual/ De novo Somatic, type Vs Variants variants family- Variants ageing- Chimp AF > 5% specific (avg per kid) related Single Base 1.23% of 5.948 Mb 6.625 Mb 6,989 Mb 45 bp ? Changes genome Structural 3% of 10.916 Mb 28.507 Mb 43,317 Mb 4,084 bp ? genome SNV:CNV 1 : 2 1 : 2 1 : 4 1 : 6 1 : 91 1 : ? ratio [Chimp genome [ Hehir-Kwa, � Guryev, 2016 ] consortium, 2005] [Kloosterman, � Guryev, 2015]

  41. Acknowledgements GoNL SV Team GoNL steering committee Wigard Kloosterman UMCU Paul de Bakker UMCU Laurent C. Francioli UMCU Dorret Boomsma VU Jayne Y. Hehir-Kwa UMCN Cornelia van Duin EMC Djie Tjwan Thung UMCN Gert-Jan van Ommen LUMC Tobias Marschall CWI/MPI Eline Slagboom LUMC Alexander Schoenhuth CWI Morris Swertz UMCG Matthijs Moed LUMC Cisca Wimenga UMCG Eric-Wubbo Lameijer LUMC University of Washington Abdel Abdellaoui VU Fereydoun Hormozdiari Slavik Koval EMC/LUMC Evan E. Eichler Joep de Ligt UMCN Najaf Amin EMC BGI Shenzen Freerk van Dijk UMCG Jun Wang Lennart Karssen EM/Polyomica ERIBA, RuG, UMC Leon Mei LUMC Groningen Kai Ye LUMC/WASHU Diana Spierings Marianna Bevova Rene Wardenaar Tristan de Jong Peter Lansdorp Positions open: - PhD student, - Scientific programmer

Recommend


More recommend