global patterns of copy number variation in humans from a
play

Global patterns of copy number variation in humans from a - PowerPoint PPT Presentation

Global patterns of copy number variation in humans from a population-based analysis. ICHG Kyoto Jean Monlong April 5, 2016 B OURQUE L AB M C G ILL U NIVERSITY H UMAN G ENETICS D EPT . Disclosure Information I have no financial relationships


  1. Global patterns of copy number variation in humans from a population-based analysis. ICHG Kyoto Jean Monlong April 5, 2016 B OURQUE L AB M C G ILL U NIVERSITY H UMAN G ENETICS D EPT .

  2. Disclosure Information I have no financial relationships to disclose 2

  3. Copy-Number Variation 3 Copy-Number Variation

  4. Copy Number Variation (CNV) Imbalanced genetic variation involving more than 500bp. 4 Copy-Number Variation

  5. CNV detection from High-Throughput Sequencing Baker 2012, Nature Methods. 5 Copy-Number Variation

  6. Low-mappability regions Repeat-rich regions, centromeres, telomeres. ∼ 13% of the human genome. 6 Copy-Number Variation

  7. Low-mappability regions Repeat-rich regions, centromeres, telomeres. ∼ 13% of the human genome. More prone to CNV. Enriched in Segmental Duplications (Sharp Annual Review 2006) . Short Tandem Repeats highly polymorphic (Warbuton BMC Genomics 2008) . Transposons involved in CNV formation (Sen AJHG 2006) . 6 Copy-Number Variation

  8. Low-mappability regions Repeat-rich regions, centromeres, telomeres. ∼ 13% of the human genome. More prone to CNV. Enriched in Segmental Duplications (Sharp Annual Review 2006) . Short Tandem Repeats highly polymorphic (Warbuton BMC Genomics 2008) . Transposons involved in CNV formation (Sen AJHG 2006) . Involved in phenotype and disease. Short Tandem Repeats and gene expression (Gymrek Nat. Genetics 2016) . Repeats CNV involved in ∼ 30 genetic disorders (Mirkin Nature 2007) . Retrotransposition in cancer (Lee Science 2012) . 6 Copy-Number Variation

  9. PopSV approach 7 PopSV approach

  10. PopSV approach Objective Test the entire genome , including low-mappability regions, and detect subtle abnormal coverage . PopSV: Population-based approach Use a set of reference experiments to detect abnormal patterns. number of reads mapped sample reference tested genomic window 8 PopSV approach

  11. Benchmark and validation Existing methods FREEC LASSO-based segmentation; GC and mappability correction. cn.MOPS Multi-sample Bayesian-based segmentation. Whole-Genome Sequencing data 45 samples, including 10 twin families (i.e 2 twins + 2 parents) . 95 pairs of normal/tumor samples from Renal Cell Carcinoma (CageKid). 9 PopSV approach

  12. Benchmark and validation Replication in the twins . Concordance with pedigree. Replication in the paired tumor . Concordance of different bin sizes PCR validation. Overall performance and in different repeat context . 10 PopSV approach

  13. Validation conclusions PopSV detects 3-5x more variants . Wider genomic range . Robust across challenging regions: Low-coverage. Segmental duplications. DNA satellites. Short tandem repeats GC-rich/poor. Resolution down to half the bin size. 11 PopSV approach

  14. CNV patterns in normal genomes 12 CNV patterns in normal genomes

  15. CNV in normal genomes 640 normal genomes 45 samples from the Twin study ( ∼ 40X) 95 normal samples from Renal Cell Carcinoma ( ∼ 54X). 500 unrelated samples from GoNL ( ∼ 14X). 13 CNV patterns in normal genomes

  16. CNV in normal genomes 640 normal genomes 45 samples from the Twin study ( ∼ 40X) 95 normal samples from Renal Cell Carcinoma ( ∼ 54X). 500 unrelated samples from GoNL ( ∼ 14X). Where are CNVs located ? In Centromere ? Telomere ? Segmental duplication ? DNA satellites ? Short tandem repeats ? Transposable Elements ? Exons ? Promoters ? 13 CNV patterns in normal genomes

  17. CNV in normal genomes 640 normal genomes 45 samples from the Twin study ( ∼ 40X) 95 normal samples from Renal Cell Carcinoma ( ∼ 54X). 500 unrelated samples from GoNL ( ∼ 14X). Where are CNVs located ? In Centromere ? Telomere ? Segmental duplication ? DNA satellites ? Short tandem repeats ? Transposable Elements ? Exons ? Promoters ? Control regions Same size distribution. Randomly distributed. 13 CNV patterns in normal genomes

  18. Enriched close to Centromere/Telomere/Gap (CTG) 1.00 0.75 cumulative proportion 0.50 0.25 region CNV control 0.00 0e+00 2e+07 4e+07 6e+07 distance to centromere/telomere/gap (bp) 14 CNV patterns in normal genomes

  19. Enriched in SD and low-coverage regions 15 CNV patterns in normal genomes

  20. Going further 1. Control for the SD and CTG patterns. 2. Look at other repeat classes. Control regions Randomly distributed. Same size distribution. 16 CNV patterns in normal genomes

  21. Going further 1. Control for the SD and CTG patterns. 2. Look at other repeat classes. Control regions Randomly distributed. Same size distribution. Same proportion overlapping a segmental duplication . Similar distance to CTG . 16 CNV patterns in normal genomes

  22. Controlling for SD and distance to CTG 17 CNV patterns in normal genomes

  23. Controlling for SD and distance to CTG 17 CNV patterns in normal genomes

  24. Controlling for SD and distance to CTG Satellites enrichment driven by ALR/Alpha , (GAATG)n/(CATTC)n families. Short Tandem Repeats Enrichment distributed across families... ... but stronger for larger STR . Transposable elements (TE): SVA class enriched. Expected: L1HS , L1PA2 to L1PA5 . Surprises: HERVH , LTR38 , LTR4 . 18 CNV patterns in normal genomes

  25. Repeat CNVs and protein-coding genes Genes with CNVs Set CNVs Exon + Promoter + Intron All CNVs 91733 7206 11341 13259 Low coverage 26888 682 1151 1977 Extremely low coverage 10010 347 465 521 STR 4286 45 286 748 Satellite 1822 2 21 33 TE 20491 164 1747 3998 STR/Satellite/TE 22313 166 1760 4014 Repeat CNV: more than 90% of the CNV is annotated as repeat. 19 CNV patterns in normal genomes

  26. Conclusion 20 Conclusion

  27. Summary PopSV uses reference samples. detects more CNVs. is robust across the entire genome. 21 Conclusion

  28. Summary PopSV uses reference samples. detects more CNVs. is robust across the entire genome. In normal genomes: CNVs enriched in low coverage regions . Specific enrichment in satellites, simple repeats, TEs . Not due to segmental duplication enrichment. Replicated across datasets but different from somatic patterns. Some CNVs in low coverage regions or repeats hit exonic sequence . 21 Conclusion

  29. Guillaume Bourque Simon Gravel Mathieu Bourgey Mathieu Blanchette Louis Letourneau Francois Lefebvre Eric Audemard Toby Hocking Simon Girard Patrick Cossette Guy Rouleau Caroline Meloche

  30. 23

  31. Workflow 24

  32. Replication in twins 25

  33. Robust across challenging regions 1.00 1.00 proportion of regions with concordant samples proportion of regions with concordant samples 0.75 0.75 set set PopSV PopSV call call 0.50 0.50 null null 0.25 0.25 0.00 0.00 low expected high [0,0.2] (0.2,0.4] (0.4,0.6] (0.6,0.8] (0.8,1] coverage class GC content 1.00 1.00 proportion of regions with concordant samples proportion of regions with concordant samples 0.75 0.75 set set PopSV PopSV 0.50 call 0.50 call null null 0.25 0.25 0.00 0.00 [0,0.2] (0.2,0.4] (0.4,0.6] (0.6,0.8] (0.8,1] [0,0.2] (0.2,0.4] (0.4,0.6] (0.6,0.8] (0.8,1] segmental duplication proportion simple repeat proportion 26

  34. Robust across challenging regions 0.0 0.2 0.4 0.6 0.8 ● 1652−Mother Using only CNVs in extremely low coverage regions ! 1652−Father family ● 1652−Twin1 1652−Twin2 ● 1480−Mother 1480−Twin2 1480−Twin1 1121 ● 1389−Mother 1389−Twin1 ● 1389−Twin2 ● 1207 1207−Mother 1207−Father 1207−Twin1 ● 1207−Twin2 1286 1286−Father 1286−Twin1 1286−Twin2 ● ● 1286−Mother 1301 Father 1389−Father other5 ● 1301−Father PopSV sample ● 1480−Father 1323 Mother 1323−Father ● 1301−Mother ● 1301−Twin1 1301−Twin2 1389 ● 1323−Mother Twin 1323−Twin1 ● 1323−Twin2 other1 1443 ● 1443−Mother 1443−Father ● 1443−Twin2 1480 1443−Twin1 ● 1121−Mother 1121−Father ● 1121−Twin1 1490 1121−Twin2 other3 other2 ● other4 1652 1490−Father ● 1490−Mother 1490−Twin1 1490−Twin2 27

  35. Resolution - 500 bp bins Vs 5 Kbp bins 1.00 proportion overlapping 5kbp−bin calls 0.75 0.50 0.25 0.00 0 2500 5000 7500 10000 12500 15000 17500 20000 size of the 500bp−bin call 28

  36. Control regions QC − SD, low−coverage and CTG distance control 1.00 proportion overlapping the feature 0.75 set CNV 0.50 control 0.25 0.00 p p a u m d g w e o s l feature 29

  37. Control regions QC − SD, low−coverage and CTG distance control 1.00 0.75 cumulative proportion 0.50 0.25 region CNV control 0.00 0e+00 2e+07 4e+07 6e+07 distance to centromere/telomere/gap (bp) 30

  38. Control regions S/2 S/2 S/2 S/2 S/2 S/2 Random region of size S = Random base in green S 31

  39. Controlling for SD and distance to CTG SINE SVA TE LTR LINE DNA SVA_F SVA_E SVA_D MER65A LTR4 TE top families LTR38−int L1PA5 L1PA4 L1PA3 L1PA2 L1HS HERVH−int AluY Twins CK Normal GoNL CK Somatic cohort Significance (−log10 Pvalue) 4 8 12 Depleted Enriched 32

Recommend


More recommend