structure determination of genomes and genomic domains by
play

Structure determination of genomes and genomic domains by - PowerPoint PPT Presentation

Structure determination of genomes and genomic domains by satisfaction of spatial restraints Assessing the limits of restraint-based 3D Genomics Marc A. Marti-Renom Structural Genomics Group (ICREA, CNAG-CRG) http://marciuslab.org


  1. Structure determination of genomes and genomic domains by satisfaction of spatial restraints Assessing the limits of restraint-based 3D Genomics Marc A. Marti-Renom Structural Genomics Group (ICREA, CNAG-CRG) http://marciuslab.org http://3DGenomes.org http://cnag.crg.eu

  2. Hybrid Method Baù, D. & Marti-Renom, M. A. Methods 58, 300–306 (2012). Experiments A Chr.18 -Pg B C D Computation

  3. Restraint-based Modeling Baù, D. & Marti-Renom, M. A. Methods 58, 300–306 (2012). Biomolecular structure determination 2D-NOESY data Chromosome structure determination 3C-based data -Pg

  4. http://3DGenomes.org P1 P2 P1 P2 i+1 P1 P2 i i+2 i+n

  5. � � � � � � � � � � � � � � � � � � � � Are the models correct? I I II II III III IV IV V V VI VI VII VII VIII VIII IX IX X X XI XI XII XII XIII XIII XIV XIV Nucleic Acids Research Advance Access published March 23, 2015 XV XV XVI XVI Nucleic Acids Research, 2015 1 doi: 10.1093/nar/gkv221 Assessing the limits of restraint-based 3D modeling of genomes and genomic domains Duan (2010) Nature Jhunjhunwala (2008) Cell ıs Serrano 1,2,5 and Marie Trussart 1,2 , Franc ¸ois Serra 3,4 , Davide Ba` u 3,4 , Ivan Junier 2,3 , Lu´ Marc A. Marti-Renom 3,4,5,* 1 EMBL / CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain, 2 Universitat Pompeu Fabra (UPF), Barcelona, Spain, 3 Gene Regulation, Stem Cells and Cancer Program, Centre for Genomic b Regulation (CRG), Barcelona, Spain, 4 Genome Biology Group, Centre Nacional d’An` alisi Gen` omica (CNAG), Barcelona, Spain and 5 Instituci´ o Catalana de Recerca i Estudis Avanc ¸ats (ICREA), Barcelona, Spain Received January 16, 2015; Revised February 16, 2015; Accepted February 22, 2015 3' 5' ABSTRACT expression regulation and replication (1–6). The advent of the so-called Chromosome Conformation Capture (3C) as- Downloaded from Restraint-based modeling of genomes has been re- says (7), which allowed identifying chromatin-looping inter- cently explored with the advent of Chromosome Con- actions between pairs of loci, helped deciphering some of 100 nm formation Capture (3C-based) experiments. We pre- the key elements organizing the genomes. High-throughput viously developed a reconstruction method to re- HoxA CTCF derivations of genome-wide 3C-based assays were estab- solve the 3D architecture of both prokaryotic and eu- lished with Hi-C technologies (8) for an unbiased identi fj - http://nar.oxfordjournals.org/ Fraser (2009) Genome Biology karyotic genomes using 3C-based data. These mod- cation of chromatin interactions. The resulting genome in- Baù (2011) Nature Structural & Molecular Biology teraction matrices from Hi-C experiments have been exten- els were congruent with fluorescent imaging valida- Ferraiuolo (2010) Nucleic Acids Research sively used for computationally analyzing the organization tion. However, the limits of such methods have not of genomes and genomic domains (5). In particular, a sig- systematically been assessed. Here we propose the ni fj cant number of new approaches for modeling the 3D or- first evaluation of a mean-field restraint-based recon- ganization of genomes have recently fm ourished (9–14). The struction of genomes by considering diverse chro- main goal of such approaches is to provide an accurate 3D mosome architectures and different levels of data representation of the bi-dimensional interaction matrices, Cluster 1 noise and structural variability. The results show which can then be more easily explored to extract biolog- by guest on March 24, 2015 that: first, current scoring functions for 3D recon- ical insights. One type of methods for building 3D models struction correlate with the accuracy of the models; from interaction matrices relies on the existence of a limited second, reconstructed models are robust to noise number of conformational states in the cell. Such methods but sensitive to structural variability; third, the local are regarded as mean- fj eld approaches and are able to cap- ture, to a certain degree, the structural variability around structure organization of genomes, such as Topo- these mean structures (15). logically Associating Domains, results in more accu- We recently developed a mean- fj eld method for model- rate models; fourth, to a certain extent, the models ing 3D structures of genomes and genomic domains based capture the intrinsic structural variability in the input on 3C interaction data (9). Our approach, called TADbit, matrices and fifth, the accuracy of the models can be was developed around the Integrative Modeling Platform a priori predicted by analyzing the properties of the (IMP, http://integrativemodeing.org), a general framework interaction matrices. In summary, our work provides for restraint-based modeling of 3D bio-molecular struc- a systematic analysis of the limitations of a mean- tures (16). Brie fm y, our method uses chromatin interaction Kalhor (2011) Nature Biotechnology field restrain-based method, which could be taken frequencies derived from experiments as a proxy of spatial Umbarger (2011) Molecular Cell Tjong (2012) Genome Research into consideration in further development of meth- proximity between the ligation products of the 3C libraries. Two fragments of DNA that interact with high frequency ods as well as their applications. are dynamically placed close in space in our models while two fragments that do not interact as often will be kept INTRODUCTION apart. Our method has been successfully applied to model the structures of genomes and genomic domains in eukary- Recent studies of the three-dimensional (3D) conforma- ote and prokaryote organisms (17–19). In all of our studies, tion of genomes are revealing insights into the organiza- the fj nal models were partially validated by assessing their tion and the regulation of biological processes, such as gene * To whom correspondence should be addressed. Tel: +34 934 020 542; Fax: +34 934 037 279; Email: mmarti@pcb.ub.cat ⃝ The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. C This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http: // creativecommons.org / licenses / by / 4.0 / ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Trussart, et al. (2015). Nucleic Acids Research. Junier (2012) Nucleic Acids Research Hu (2013) PLoS Computational Biology

  6. Toy models start non-TAD-like TAD-like SIMULATED TOY GENOME Matrix generation Circular TAD1 TAD2 TAD3 ADD MONTE CARLO NOISE SIMULATED Hi-C MATRICES CONTACT TO Model building by TADbit DISTANCES 150 bp/nm 75 bp/nm 40 bp/nm CREATE PARTICLES & ADD RESTRAINTS SIMULATED ANEALING MONTE-CARLO set 0 ( Δ ts = 10 0 ) set 1 ( Δ ts = 10 1 ) set 2 ( Δ ts = 10 2 ) MODEL SELECTION Analysis (lowest objective function) Contact Map MODEL ANALYSIS Simulated “Hi-C” matrix Contact with noise (d < 200 nm) by Ivan Junier end

  7. Toy interaction matrices set 0 ( Δ ts=10 0 ) 0 1Mb 0 Frequency 1Mb set 4 ( Δ ts=10 4 ) 0 1Mb 0 Frequency 1Mb set 6 ( Δ ts=10 6 ) 0 1Mb 0 1Mb

  8. Reconstructing toy models chr40_TAD α =100 Δ ts=10 TADbit-SCC: 0.91 <dRMSD>: 32.7 nm <dSCC>: 0.94 chr150_TAD α =50 Δ ts=1 TADbit-SCC: 0.82 <dRMSD>: 45.4 nm <dSCC>: 0.86

  9. TADs & higher-res are “good” 175 150 125 dRMSD (nm) 100 75 50 25 0 40 75 150 Resolution

  10. Noise is “OK” - + noise level 150 125 <dRMSD> (nm) 100 75 r = -0.88 r = -0.76 r = -0.94 50 r = -0.90 r = -0.91 r = -0.67 r = -0.90 25 r = -0.96 r = -0.87 0 0.4 0.5 0.6 0.7 0.8 0.9 1.0 TADbit-SCC

  11. Structural variability is “NOT OK” - + structural variability 150 125 <dRMSD> (nm) 100 75 50 r = -0.67 25 0 0.4 0.5 0.6 0.7 0.8 0.9 1.0 TADbit-SCC

  12. Can we predict the accuracy of the models? 4.5 Toy genome: chr40_TAD 7 Density: 40 bp/nm TADs: Yes 3.0 Noise: 150 6 Δ ts: 10 0 eigenvalues (% contribution) 1.5 % Sig. Cont. EV: 32.3 5 Skewness: -0.32 Kurtosis: -0.69 0.0 4 Z-score 0.18 -1.5 3 0.14 Frequency 0.10 2 -3.0 0.06 0.02 1 -4.5 0.00 -8 -6 -4 -2 0 2 4 6 0 Z-score 10 0 10 1 10 2 -6.0 eigenvalues index (log) 150 <dRMSD> (nm) 100 50 r = -0.53 r = 0.63 r = 0.75 -1 0 1 2 3 -2 0 2 4 6 8 0 5 10 15 20 25 30 35 Skewness (SK) Kurtosis (KT) % Sig. Cont. eigenvalues (SEV)

  13. Skewness “side effect” 150 100 <dRMSD> (m) 50 - + noise levels - + structural variability 0 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Skewness (SK)

  14. Can we predict the accuracy of the models? MMP = − 0 . 0002 ∗ Size + 0 . 0335 ∗ SK − 0 . 0229 ∗ KU + 0 . 0069 ∗ SEV + 0 . 8126 Human Chr1:120,640,000-128,040,000 1.0 Size: 186 SEV: 3.63 r = 0.84 SK: 0.20 KT: -0.53 0.9 MMP: 0.82 0.8 MMP score 1.0 0.7 0.9 0.8 MMP score 0.6 0.7 0.6 0.5 0.5 0.4 0.4 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0 dSCC dSCC

  15. Higher-res is “good” put your $$ in sequencing Noise is “OK” no need to worry much Structural variability is ”NOT OK” homogenize your cell population! …but we can differentiate between noise and structural variability and we can a priori predict the accuracy of the models

Recommend


More recommend