detection of copy number alterations and
play

Detection of copy number alterations and loss of heterozygosity - PowerPoint PPT Presentation

Complete Genome analysis: Detection of copy number alterations and loss of heterozygosity Control-FREEC tutorial Valentina BOEVA Institut Curie, INSERM, Mines ParisTech Workshop outlines Motivation for copy number detection in cancer


  1. Complete Genome analysis: Detection of copy number alterations and loss of heterozygosity Control-FREEC tutorial Valentina BOEVA – Institut Curie, INSERM, Mines ParisTech

  2. Workshop outlines • Motivation for copy number detection in cancer samples • ControlFREEC tool presentation  Methodology & functionalities • ControlFREEC tutorial on Galaxy  Hands on workshop

  3. Cancer genomes are often significantly rearranged A 24 color karyotype of a neuroblastoma cell line 3

  4. In cancer genome, it is important to detect CNAs and LOH CNAs – copy number alterations: • Large-scale genomic deletions • Large-scale genomic duplications • Amplicons (duplications >10 times) LOH – loss of heterozygosity regions 4

  5. Amplification of an important gene can favor cancer development MYCN amplification, which occurs in approximately 22% of primary neuroblastomas, is one • of the most powerful prognostic factors identified to date. It is significantly associated with advanced-stage disease, rapid tumor progression, and poor prognosis. MYCN part of chr2 DDX1 more than 100 copies 5

  6. Amplification of an important gene can favor cancer development MYCN amplification, which occurs in approximately 22% of primary neuroblastomas, is one • of the most powerful prognostic factors identified to date. It is significantly associated with advanced-stage disease, rapid tumor progression, and poor prognosis. Probability of event- free survival (%) From Kawa K et al. JCO 1999 From Schneiderman, J. et al. 2008 Overall survival curve for MYCN-amplified neuroblastoma patients relative to treatment after induction chemotherapy. Kaplan-Meier survival curves for 600 stage A, B, and Ds patients by A, patients who underwent autologous bone marrow MYCN status. Event-free survival. transplantation (ABMT)/peripheral-blood stem-cell transplantation (PBSCT) ; B, patients who did not undergo ABMT/PBSCT. 6

  7. Deletion in an important gene can favor cancer development • Patient was treated again breast and ovarian cancer • She developed therapy- related acute myeloid leukemia (t-AML) • Whole-genome sequencing revealed a novel, heterozygous 3-kilobase deletion removing exons 7-9 of TP53 in the patient’s normal skin DNA, which was homozygous in the leukemia DNA as a result of acquired uniparental disomy . Adopted from C. Link et al., 2011 7

  8. Copy neutral loss of heterozygosity (LOH) or acquired uniparental disomy (UPD) often happens in cancer In UPD, a person receives two copies of a chromosome, or part of a chromosome, from one parent and no copies from the other parent. This acquired homozygosity could lead to development of cancer if the individual inherited a non-functional allele of a tumor suppressor gene. 8 From Wikipedia

  9. Identification of regions of gain and loss helps to predict the aggressiveness of cancer Copy number profile (chr 11) of a metastatic neuroblastoma sample: 9

  10. Identification of regions of gain and loss helps to predict the aggressiveness of cancer From Carén H et al. PNAS 2010;107:4323-4328 Kaplan-Meier overall survival for patients with tumors with different genomic profiles. 10

  11. Detection of SNVs, indels, structural variants, copy number changes and LOH has become possible with Next Generation Sequencing (NGS) • Next Generation sequencing = Fast, Accurate Reading of DNA  Whole genome  Exome sequencing  Targeted sequencing 11

  12. Detection of SNVs, indels, structural variants, copy number changes and LOH has become possible with Next Generation Sequencing (NGS) • Next Generation sequencing = Fast, Accurate Reading of DNA  Whole genome  Sequencing of the whole cancer genome including intragenic regions and introns  Complete information about the genome  Exome sequencing  Targeted sequencing 12

  13. Detection of SNVs, indels, structural variants, copy number changes and LOH has become possible with Next Generation Sequencing (NGS) • Next Generation sequencing = Fast, Accurate Reading of DNA  Whole genome  Exome sequencing  Sequencing of exons of ~20000 well characterized genes  Complete information about SNVs, indels and copy number changes of the coding part of the genome  Targeted sequencing 13

  14. Detection of SNVs, indels, structural variants, copy number changes and LOH has become possible with Next Generation Sequencing (NGS) • Next Generation sequencing = Fast, Accurate Reading of DNA  Whole genome  Exome sequencing  Targeted sequencing  Complete information about SNVs, indels, copy numbers of a small panel of genes (10-500) actionable in cancer 14

  15. Today we will speak only about detection of CNAs and LOH, only in WGS and WES data • Screenshot. 15

  16. Read count (RC) is calculated in sliding windows – read count in each window Gain Normal Loss chromosome position 16

  17. We need to normalize read count per window to get meaningful profiles Sample Control 1000 800 Read count per 50kb-window 600 600 400 200 200 0 0 0 0 1000 1000 2000 2000 3000 3000 50kb-window, chr 5 50kb-window, chr 5 Position, chr5 Loss 17 ?

  18. If control is available, the problem is easily solved 3.0 Normalized Read Count Normalized read count per 50kb- window 2.0 1.0 0.0 0 1000 2000 3000 Position, chr5 50kb-window, chr 5 Loss 18

  19. If there is no control dataset, normalization can be done using the GC-content Control GC-content Position, chr5 19

  20. RC can be modeled as a polynomial on GC-content A scatter plot shows the dependency RC ~ GC-content Read count per 100kb-window GC-content 20 ?

  21. RC can be modeled as a polynomial on GC-content Control, COLO-829BL COLO-829 NCI-H2171 mate pairs mate pairs paired ends Read count per 50kb-window GC-content GC-content GC-content – main component – components corresponding to losses and gains Here RC was modeled as a polynomial of order three on GC-content 21

  22. The resulting profiles are segmented to detect gains and losses g i = GC-content in window i RC   Transformation: i NRC ploidy RC i = is read count in window i, i f ( g ) NRC i = resulting normalized read count i – normal copy number – loss – gain Normalized copy number Genomic position (3-kb window), chr5

  23. In summary • Control-FREEC detects Copy Number Alterations (CNAs) in whole genome sequencing data • Control-FREEC uses a sliding window approach • It also allows visualizing CNAs and LOH at the genome scale

  24. Visualization of copy number profiles calculated by software FREEC – normal copy number – loss – gain 24

  25. There are 3 problems of genomic profiling 1. Reference point for copy number variation (diploid, triploid, tetraploid genomes) One copy gain in a diploid genome One copy gain in a tetraploid genome 2.0 2.0 normalized ratio normalized ratio 1.0 1.0 0.0 0.0 0 100 200 300 400 0 100 200 300 400 window along the genome window along the genome 25

  26. There are 3 problems of genomic profiling 2. Contamination of tumor samples by normal stroma cells 26

  27. . We can evaluate contamination of a tumor sample by normal cells Normalized copy number Normalized copy number Genomic position (3-kb window) Genomic position (3-kb window) 27

  28. . We can evaluate contamination of a tumor sample by normal cells Normalized copy number Normalized copy number Genomic position (3-kb window) Genomic position (3-kb window) 28

  29. There are 3 problems of genomic profiling 3. Intra-tumoral heterogeneity from Kost-Alimova et al, BMC Cancer 2007 29

  30. There are 3 problems of genomic profiling 3. Intra-tumoral heterogeneity One solution: Tumor Heterogeneity Analysis (THetA) http://compbio.cs.brown.edu/projects/theta/ L. Oesper, A. Mahmoody, and B.J. Raphael. (2013) THetA: Inferring intra-tumor heterogeneity from high-throughput DNA sequencing data. Genome Biology. 14:R80. 30

  31. Now we want to detect genotype status (including LOH) or Loss Of Heterozygosity (LOH) 31

  32. We characterize the allelic content via the B allele frequency (BAF) • B allele = alternative variant in dbSNP 0.44 0.5 0.57 0.45 B allele frequency (BAF) Observed nucleotide frequencies ac G atgacgtca A atgctagcgag G cacacaa T ac Reference genome (A allele) ac C atgacgtca T atgctagcgag C cacacaa A ac dbSNP (B allele) 32

  33. There is a correspondence between copy number and possible BAF 33

  34. We infer the genotype status of a region from B allele frequency profiles AA or BB AB 34 ?

  35. To infer the genotype status of a region from B allele frequency profiles we use Gaussian mixture model (GMM) fit • We try different fits and choose a fit with the best likelihood The fit indicates that the genotype = AA/BB The fit indicates that the genotype = AB with 40% contamination by normal (“AB”) cells Fit with 3 modes: Fit with 4 modes: • AA • AA • AB • BB • BB • AA*0.6+AB*0.4 • BB*0.6+AB*0.4 35

  36. Visualization of BAF 36

  37. Extending Control-FREEC to the exome sequencing data uneven coverage of exons • Exome data:  Capture bias  GC-content and mappability correction is not enough • Mandatory use of a control sample to normalize read counts 37

  38. Exome sequencing data may be much more noisy than whole genome sequencing data Additional bias (capture) => additional noise 38

Recommend


More recommend