⊲ Take-home message: very few false positives ... and yet for good reasons. (Dis3, Rrp4) (Rpn12, Rpn8) (Rrp40, Rrp43) (Rrp4, Rrp43) (Rpn12, Rpn3) ⊲ Exosome (E Ref = C Xtal ): scores for solutions and signed contact scores (Rrp41, Rrp43) (Rpn12, Rpn7) (Rrp43, Rrp45) Signed Scores for Contacts and Solutions in S MILP (Rpn12, Sem1) (Csl4, Rrp45) (Rpn12, Rpn5) (Csl4, Rrp41) (Csl4, Rrp46) (Rpn11, Rpn12) (Csl4, Rrp42) ⊲ Proteasome (E Ref ): signed contact scores, and scores for solutions (Rpn3, Rpn9) (Csl4, Rrp43) (Rpn6, Rpn9) (Csl4, Mtr3) (Rpn11, Rpn9) (Mtr3, Rrp43) (Rrp42, Rrp43) Contacts (Rpn5, Rpn7) Contacts (Dis3, Rrp43) (Rpn3, Rpn7) (Dis3, Rrp45) (Rpn7, Rpn9) (Dis3, Rrp41) (Dis3, Rrp42) (Rpn11, Rpn3) (Rrp4, Rrp41) (Rpn11, Rpn5) (Rrp4, Rrp45) (Rpn5, Rpn9) (Rrp4, Rrp42) (Rpn11, Rpn7) (Rrp41, Rrp42) (Rrp42, Rrp45) (Rpn8, Rpn9) (Rrp40, Rrp46) (Rpn6, Rpn8) (Rrp45, Rrp46) (Rpn7, Sem1) (Rrp41, Rrp45) (Mtr3, Rrp42) (Rpn3, Rpn5) (Rrp40, Rrp45) (Rpn5, Rpn8) (Rrp43, Rrp46) (Rpn3, Sem1) 2000 1500 1000 500 0 -500 -1000 350 300 250 200 150 100 50 0 -50 -100 -150 Signed Score Signed Score 10000 2350 9900 2300 9800 2250 9700 Solution Score Score 2200 9600 2150 9500 9400 2100 9300 2050 200 150 100 50 0 35 30 25 20 15 10 5 0 #Solutions #Solutions
Parsimony and Precision for Individual Solutions in S MILP : Yeast Exosome ⊲ Algorithm NI : genetic algorithm by Robinson et al. Complex #types E Ref | E Ref | | S NI | P NI;E Ref (S NI ) Exosome 10 C Xtal 26 12 12 9 ( NC ∗ ) 19 S Lid 9 C Cryo ∪ C Dim ∪ C XL 19 8 17 ∗∗ eIF3 12 C Cryo ∪ C Dim ∪ C XL 17 14 ⊲ MILP |S cons . P MILP;E Ref ( S cons . Complex #types E Ref | E Ref | | S MILP | |S MILP | P MILP;E Ref ( S MILP ) MILP | MILP ) 10 C Xtal 26 10 1644 (7, 9, 10) 12 (8, 9, 10) Exosome 19 S Lid 9 C Cryo ∪ C Dim ∪ C XL 19 10 324 (7, 8, 10) 18 (8, 9, 10) eIF3 12 C Cryo ∪ C Dim ∪ C XL 17 13 180 (8, 10, 12) 36 (9, 10, 11) ⊲ Greedy |S cons . P Greedy;E Ref ( S cons . Complex #types E Ref | E Ref | | S G | |S Greedy | P Greedy;E Ref ( S Greedy ) Greedy | Greedy ) Exosome 10 C Xtal 26 10 756 (7, 9, 10) 756 (7, 9, 10) 19 S Lid 9 C Cryo ∪ C Dim ∪ C XL 19 10 324 (7, 8, 10) 18 (8, 9, 10) eIF3 12 C Cryo ∪ C Dim ∪ C XL 17 13 108 (9, 10, 12) 36 (9, 10, 11) ⊲ Take-home message: – MILP is more parsimonious than NI – more than 80% of edges in consensus solutions: true positives
Precision for the Union of Solutions in S MILP ⊲ For each protein: union of neighborhood versus contacts in the assembly ⊲ Symmetric difference between two sets S and R : S ∆ s R = ( | S \ R | , | S ∩ R | , | R \ S | ) . (1) ⊲ Applied to the union of neighborhoods vs reference contacts: � N ( p , S A )∆ s N ( p , R ) ≡ ( N ( p , S ))∆ s N ( p , R ) (2) S ∈S A ⊲ Results (false positives, true positives, missed contacts) Protein Ref. Degree N ( p , S )∆ s N ( p , R ) Dis3 4 (1, 4, 0) Rrp4 5 (2, 3, 2) Rrp43 6 (3, 6, 0) Rrp45 7 (2, 6, 1) Rrp46 5 (0, 4, 1) Rrp41 4 (2, 4, 0) Rrp40 4 (0, 3, 1) Csl4 6 (2, 4, 2) Rrp42 5 (2, 5, 0) Mtr3 6 (0, 3, 3)
Modeling Contacts in Macro-molecular Assemblies Problem Statement Hardness and Algorithms — Computer Science Results — Structural Biology Outlook
Outlook ⊲ Structural Biology – Mass spec. for protein complexes: about to revolutionize structural biology → reference algorithms for connectivity inference – Excellent agreement with experimental data – Solutions more parsimonious than previously computed ones – For current examples: MILP always succeeds – Software: about to be released ( MILP , Greedy ) ⊲ Computer science: selected open questions – MILP has a hard time to outperform Greedy: is the approx. factor tight? – Structure of the solution set depending on structural properties of the unknown graph (min cuts) structure of the Hasse diagram of vertex sets ( hierarchical vs flat ) – Problem size: moving from ∼ 10 to ≤ 500 vertices multiplicity issues appear : multiples copies per protein – Beyond topological information: 3D embedding of the solutions? minimum connectivity, degree of nodes
References ◮ Connectivity Inference in Mass Spectrometry based Structure Determination D. Agarwal, and J. Araujo, and C. Caillouet, and F. Cazals, and D. Coudert, and S. Perennes European Symposium on Algorithms (LNCS 8125), 2013 ◮ Unveiling Contacts within Macro-molecular assemblies by solving Minimum Weight Connectivity Inference Problems D. Agarwal, and C. Caillouet, and F. Cazals, and D. Coudert submitted, 2014
Overview PART 1:Connectivity Inference from Native Mass Spectrometry Data PART 2:Building Coarse Grain Models PART 3:Handling uncertainties in Macro-molecular Assembly Models PART 4:Conformational Ensembles and Energy Landscapes: Analysis PART 5:Conformational Ensembles and Energy Landscapes: Comparison
Greedy Geometric Algorithms for Collections of Balls, with Applications to Geometric Approximation and Molecular Coarse-Graining F. Cazals and T. Dreyfus and S. Sachdeva and N. Shah (C) Interpolated (A) Inner (B) Outer
Modeling Contacts in Macro-molecular Assemblies Problem Statement Results Algorithm Outlook
Separating the Molecules: Finding (Thick) Cracks Within a Map ⊲ NPC: probability density maps ⊲ Cryo-EM density maps ⊲ Antelope canyon, AZ, USA
Checkpoint ⊲ Consider a planar domain D defined by a simple curve. To cover domain D with balls, where should these balls be centered?
Coarse Graining with a Fixed Budget of k balls: Overview ⊲ Three approximation problems of a given input shape: – inner approximation with largest volume – outer approximation with least extra volume – volume preserving approximation ⊲ From crystal structure: inner / outer / interpolated approximations 3sgb (1690 atoms), approximated with 85 balls (5% of atoms) (C) Interpolated (A) Inner (B) Outer ⊲ NB: weighted versions accommodated too
Coarse Graining with a Fixed Budget of k balls: Problems ⊲ Input: F O defined by a union of n balls ⊲ Output: k < n balls defining the approximation F S ⊲ Three problems: ◮ inner approximation : F S ⊂ F O ◮ outer approximation : F O ⊂ F S ◮ interpolated approximation : an approximation sandwiched between the inner and outer approximations. ◮ Volume preserving approximation: Vol( F S ) = Vol( F O ) P 1 P 1 P 3 P 3 P 2 P 2
Modeling Contacts in Macro-molecular Assemblies Problem Statement Results Algorithm Outlook
Greedy Assessment: Volume Covered Incidence of the Topology ⊲ Input domain versus domain of the selection: volume comparisons F r O : input balls expanded by a quantity r → r = 0: input model F r S : domain of the selection for the expanded model Assessment: Vol( F r S ) / Vol( F r O ) for increasing r ⊲ PDB code 1igt: 1690 balls ⊲ PDB 1igt: 10416 balls
Greedy Assessment: (Signed) Hausdorff Distance ⊲ Signed dist. of point p w.r.t. compact domain F : � − min q ∈ ∂ F d ( p , q ) if p ∈ F , s ( p , ∂ F ) = + min q ∈ ∂ F d ( p , q ) otherwise, ⊲ Distance between boundaries: input domain ∂ F O vs selection ∂ F S : S H ( ∂ F O , ∂ F S ) = [ min s ( p , ∂ F O ) , max s ( p , ∂ F O ); min s ( p , ∂ F S ) , max s ( p , ∂ F S )] p ∈ ∂ F S p ∈ ∂ F S p ∈ ∂ F O p ∈ ∂ F O d 1 d 2 d 3 Approx. Input d 4 ⊲ Assessment on a set of 96 protein complexes (1008 -13214 atoms)
Volume Preserving Approximations: Results e k / n d 1 d 2 d 3 d 4 0.01 − 8 . 39 ± 1 . 76 7 . 26 ± 1 . 74 − 6 . 12 ± 1 . 77 5 . 54 ± 1 . 38 r w − 7 . 64 ± 1 . 76 5 . 46 ± 1 . 11 − 7 . 11 ± 2 . 41 4 . 89 ± 1 . 63 r w 0.02 r w 0.05 − 5 . 61 ± 1 . 63 2 . 94 ± 0 . 85 − 7 . 43 ± 2 . 38 4 . 76 ± 2 . 44 0.10 − 4 . 05 ± 1 . 71 2 . 77 ± 1 . 52 − 7 . 80 ± 1 . 80 5 . 25 ± 2 . 23 r w r w mean − 6 . 48 ± 2 . 42 4 . 66 ± 2 . 30 − 7 . 10 ± 2 . 21 5 . 11 ± 1 . 98 5 . 6 0.01 − 3 . 17 ± 0 . 88 3 . 49 ± 0 . 34 − 4 . 36 ± 0 . 78 2 . 43 ± 0 . 24 5 . 6 0.02 − 2 . 25 ± 1 . 54 2 . 58 ± 0 . 22 − 3 . 55 ± 0 . 61 1 . 49 ± 0 . 15 5 . 6 0.05 − 0 . 91 ± 0 . 35 1 . 68 ± 0 . 14 − 2 . 77 ± 1 . 11 0 . 65 ± 0 . 91 5 . 6 0.10 − 0 . 38 ± 0 . 12 1 . 08 ± 0 . 13 − 1 . 68 ± 0 . 47 0 . 28 ± 0 . 07 5 . 6 mean − 1 . 92 ± 1 . 44 2 . 41 ± 0 . 89 − 3 . 33 ± 1 . 20 1 . 38 ± 0 . 94 ⊲ Take home message: with a number of balls ∼ 5% of atoms molecular volume exactly preserved distance between surfaces ∼ 2 − 3 atoms (SAS model)
Modeling Contacts in Macro-molecular Assemblies Problem Statement Results Algorithm Outlook
Medial Axis and Relatives ⊲ For any open set R ⊂ R n : ◮ Medial axis: points with at least two nearest neighbors in R ◮ Skeleton: centers of maximal balls ◮ Singular set: points where the distance function is not A 3 1 differentiable C ⊲ For a smooth curve/surface: A 2 MA ⊂ Skeleton 1 ⊲ Skeleton and local thickness: A 3 ◮ Local: curvature properties ◮ Global: related to bi/tri/tetra-tangent balls ⊲ Medial axis transform: MAT
Max k -cover and the Greedy Strategy ⊲ Greedy may fail: ⊲ max k -cover: A : alphabet of m A 5 A 6 C 3 C : collection of subsets of A 4 4 8 Select k subsets from C A 3 A 4 C 2 maximizing the number of points 2 2 4 from A which are covered ⊲ Hardness: A 1 A 2 C 1 1 1 2 – problem is NP -complete – OPT cannot be approximated within 1 − 1 / e + ε unless P = NP C 4 C 5 – Greedy algorithms achieve the 1 − 1 / e bound 7 7 Greedy: C 3 + C 2 = 12 ⊲ Ref: OPT: C 4 + C 5 = 14 Feige; J. ACM; 1998
Geometric Max k -cover for Balls ⊲ Medial axis of the domain F O , associated covering F C , and induced arrangement of balls A 1 1 c 6 c 5 m 2 2 2 c 7 3 3 c 4 4 c 1 m 1 c 2 c 3 ⊲ Given a function defined on the cells of A : – Maximize the weight of a selection of k cells – Two cases: volume vs surface arrangements For the latter: cf role of the MA w.r.t. F C = ∪ i B i ⊲ Complexity: geometric versions of max k -cover ⊲ Ref: Amenta, Kolluri; CGTA; 2001 ⊲ Ref: Feige; J. ACM; 1998
Inner Approximation ⊲ Punchline: – The first provably correct volume-based approximation algorithm of 3D shapes, which works in a finite setting ( � = the ε -sample framework) ⊲ Thm. The MAT of a union of balls is discrete in the following sense: � � B ∗ F C = B i = v . (3) v ∈V i with V the vertices of the medial axis. ⊲ Corr. The 3D arrangement induced by balls in V can be used to run greedy algorithms. ⊲ Thm. The Greedy strategy for positive volume weights has the following approximation ratios: 1 − (1 − 1 / k ) k > 1 − 1 / e � wrt to OPT weight (volume) (4) 1 − (1 − 1 / n ) k wrt the total weight (volume) ⊲ Obs. The Greedy strategy for positive surface weights can be as bad ad 1 / k 2 . ⊲ Ref: Cazals, Dreyfus, Sachdeva, Shah; Comp. Graphics Forum, 2014
Robust Implementation of Greedy for the Volume Case: A High-profile Implementation ⊲ Delaunay triangulation (DT) DTB of the input balls ⊲ Delaunay triangulation DTV of the boundary points of ∂ F C – Points have degree two algebraic coordinates – Degeneracies to be handled (e.g. n > 3 coplanar points) ⊲ Medial axis of the input balls – Voronoi diagram DTV ∗ clipped by the α -shape of DTB ⊲ MAT restricted to vertices of the MA ⊲ Volume computations to run greedy ⊲ Ref: De Castro and F. Cazals and S. Loriot and M. Teillaud; CGTA; 2009 ⊲ Ref: Cazals and H. Kanhere and S. Loriot; ACM TOMS; 2011
Modeling Contacts in Macro-molecular Assemblies Problem Statement Results Algorithm Outlook
Outlook ⊲ Pros Flexible framework to design approximations Inner / outer / volume preserving approximations The molecule or complex can be processed as a whole or can be decomposed into regions processed independently ⊲ Geometric models produced can be complemented by Connectivity information Biophysical properties
References ◮ F. Cazals and T. Dreyfus and S. Sachdeva and N. Shah, Greedy Geometric Algorithms for Collections of Balls, with Applications to Geometric Approximation and Molecular Coarse-Graining, Computer Graphics Forum, 2014.
Overview PART 1:Connectivity Inference from Native Mass Spectrometry Data PART 2:Building Coarse Grain Models PART 3:Handling uncertainties in Macro-molecular Assembly Models PART 4:Conformational Ensembles and Energy Landscapes: Analysis PART 5:Conformational Ensembles and Energy Landscapes: Comparison
Assessing the Reconstruction of Macro-molecular Assemblies with Toleranced Models Frederic Cazals, Tom Dreyfus, Inria ABS Valerie Doye, Inst. J. Monod Algorithms - Biology - Structure project-team INRIA Sophia Antipolis France ∆(7) ∆(5) ∆ 2 (2 , 5 , 6) ∆ 1 (1 , 2 , 4) ∆ 1 (2 , 5 , 6) ∆ 1 (4) ∆(2) ∆(6) ∆ 1 (2 , 3 , 4) ∆ 1 (1 , 3 , 4) ∆(3) ∆ 2 (2 , 3 , 4) ∆ 2 (4) ∆ 2 (1 , 2 , 4) ∆ 2 (1 , 3 , 4) ∆(1)
Modeling Contacts in Macro-molecular Assemblies Introduction Voronoi Diagrams Compoundly Weighted Voronoi Diagrams and their λ -Complex Assessing the Reconstruction of Macro-Molecular Assemblies Probing assemblies With Graphical Models Conclusion and Perspectives
Structural Dynamics of Macromolecular Processes Reconstructing Large Macro-molecular Assemblies Bacterial flagellum Nuclear Pore Complex Branched actin filaments – Molecular motors rotary propeller nucleocytoplasmic transport muscle contraction, cell division – NPC – Actin filaments – Chaperonins – Virions – ATP synthase Chaperonin cavity Maturing virion ATP synthase protein folding HIV-1 core assembly synthesis of ATP in mitoch. and chloroplasts ⊲ Difficulties ⊲ Core questions Modularity Reconstruction / animation Flexibility Integration of (various) experimental data Coherence model vs experimental data ⊲ Ref: Russel et al, Current Opinion in Cell Biology, 2009
Reconstructing Large Assemblies: a NMR-like Data Integration Process ⊲ Four ingredients – Experimental data – Model: collection of balls – Scoring function: sum of restraints restraint : function measuring the agreement ≪ model vs exp. data ≫ – Optimization method (simulated annealing,. . . ) ⊲ Restraints, experimental data and . . . ambiguities: Assembly : shape cryo-EM fuzzy envelopes Assembly : symmetry cryo-EM idem Assembly : sub-systems mass spec. stoichiometry Complexes: : interactions TAP (Y2H, overlay assays) stoichiometry Instance: : shape Ultra-centrifugation rough shape (ellipsoids) Instances: : locations Immuno-EM positional uncertainties ⊲ Ref: Alber et al, Ann. Rev. Biochem. 2008 + Structure 2005
Checkpoint ⊲ Consider a real valued function: f ( x , y , z ) : R 3 − → R (5) What is, in general, the locii of point defined as follows: S = { p = ( x , y , z ) ∈ R 3 | f ( p ) = c } (6)
Morse Homology: Illustration ⊲ Example: evolving homology of a 3D landscape defined by a polynomial � 2 + � 2 + x 2 + y 2 + z − 1 z 2 + y 2 + x − 3 x 2 + z 2 + y − 2 � 2 � � � P = CP#8, index 1: (1 , 0 , 0) − → (1 , 1 , 0) CP#9, index 2: (1 , 1 , 0) − → (1 , 0 , 0) ⊲ Key construction: the Morse-Smale(-Witten) chain complex i.e. the connections between critical points whose indices differ by one is sufficient to compute the Betti numbers ⊲ Ref: R. Tom, Sur une partition en cellules...; CRAS; 1449 ⊲ Ref: S. Smale; Differentiable dynamical systems; Bull. AMS; 1967 ⊲ Ref: R. Boot, Morse theory indomitable, Pub. IHES, 1988
Uncertain Data and Toleranced Models: the Example of Molecular Probability Density Maps ⊲ Probability Density Map of a Flexible Molecule – Each point of the probability density map: probability of being covered by a conformation ⊲ Question: How does one accommodate high/low density regions? ⊲ Toleranced ball S i – Two concentric balls of radius r − i < r + i : inner ball S i [ r − i ]: high confidence region outer ball S i [ r + i ]: low confidence region ⊲ A continuum of models – Linear interpolation of radii: r i ( λ ) = r − i − r − i + λ ( r + i ) – Tracking intersections of S i [ r i ( λ )]: → Voronoi diagram of toleranced balls
Modeling Contacts in Macro-molecular Assemblies Introduction Voronoi Diagrams Compoundly Weighted Voronoi Diagrams and their λ -Complex Assessing the Reconstruction of Macro-Molecular Assemblies Probing assemblies With Graphical Models Conclusion and Perspectives
Voronoi diagrams in Biology, Geology, Engineering V or ( B 7 ) c 7 V or ( B 5 ) c 5 V or ( B 2 ) c 6 V or ( B 4 ) c 2 V or ( B 6 ) V or ( B 3 ) c 4 c 3 c 1 V or ( B 1 ) ⊲ Ref: Cazals, Dreyfus; Symp. on Geometry Processing, 2010
The α -complex: Demo VIDEO/ashape-two-cc-cycle-video.mpeg ⊲ α -complex – simplicial complex encoding the topology of growing balls – multi-scale analysis of a collection of balls how many clusters / clusters’ stability? topology of the clusters?
Euclidean Voronoi diagram and α -complex x 2 x 1 ⊲ Voronoi diagram of S = { x i } – Voronoi region Vor ( x i ): { p | d ( p , x i ) < d ( p , x j ) , i � = j } x 3 ⊲ Dual complex K ( S ) x 2 – Delaunay triangulation (Euclidean case) x 1 – Simplex ∆: dual of � x i ∈ ∆ Vor ( x i ) � = ∅ x 3 ⊲ α -complex K α ( S ) – Grown spheres: S i ,α = S i ( x i , α ) – Restricted Voronoi region: x 1 x 2 x 1 x 2 R i ,α = S i ,α ∩ Vor ( x i ) – ∆ ∈ K α ( S ): � x i ∈ ∆ R i ,α � = ∅ x 3 x 3 ⊲ α -complex: topological changes induced by a growth process
Growth Processes and Curved Voronoi diagrams ⊲ Power diagram: ⊲ Mobius diagram: d ( S ( c , r ) , p ) = � c − p � 2 − r 2 d ( S ( c , µ, α ) , p ) = µ � c − p � 2 − α 2 ∆(7) ∆ 2 (2 , 5 , 6) ∆(5) ∆ 1 (1 , 2 , 4) ∆ 1 (2 , 5 , 6) ∆ 1 (4) ∆(2) ∆ 1 (2 , 3 , 4) ∆(6) ∆ 1 (1 , 3 , 4) ∆(3) ∆ 2 (2 , 3 , 4) ∆ 2 (4) ∆ 2 (1 , 2 , 4) ∆ 2 (1 , 3 , 4) ∆(1) ⊲ Apollonius diagram: ⊲ Compoundly Weighted Voronoi diagram : d ( S ( c , r ) , p ) = � c − p � − r d ( S ( c , µ, α ) , p ) = µ � c − p � − α ⊲ Ref: Boissonnat, Wormser, Yvinec; in Effective Comp. Geom. ; 2006
Modeling Contacts in Macro-molecular Assemblies Introduction Voronoi Diagrams Compoundly Weighted Voronoi Diagrams and their λ -Complex Assessing the Reconstruction of Macro-Molecular Assemblies Probing assemblies With Graphical Models Conclusion and Perspectives
From Toleranced Balls to Compoundly Weighted Points and Compoundly Weighted Voronoi Diagrams ⊲ Toleranced ball S i ( c i ; r − i ; r + i ) and radius interpolation: – Radius discrepancy : δ i = r + i − r − i r + i – Grown ball S i [ λ ]( c i , r i ( λ )) with r i ( λ ) = r − + λδ i i p ⊲ Growing ball swallowing a point p : r i ( λ ) – p is at the surface of S i [ λ ] c i ⇔ r i ( λ ) = || c i p || r − i || c i p ||− r − ⇔ λ = i δ i ⊲ From Toleranced Ball to Compoundly Weighted Point: r − 1 – S i ( c i ; µ i = δ i , α i = δ i ) i r − 1 – λ ( S i , p ) = δ i || c i p || − i δ i The Voronoi Diagram induced by Toleranced Balls is the Compoundly Weighted one !
Bisectors ζ i,j x i ⊲ Rationale from the Euclidean Voronoi diagram: x j – Bisector ζ i , j of ( x i , x j ) centers of circumscribed balls to x i and x j ζ i,j ⊲ Generalization to the CW case: – Bisector ζ i , j of ( S i , S j ) centers of toleranced tangent balls to S i S i and S j S j ⇒ degree four algebraic surface – Extremal toleranced tangent balls smallest one of radius ρ ⇒ first intersection of S i 0 [ ρ ] , . . . , S i k [ ρ ] largest one of radius ρ ⇒ last intersection of S i 0 [ ρ ] , . . . , S i k [ ρ ]
Voronoi Diagram and its Dual Complex: Topological Complications ⊲ Partition of the ambient space: ∆(3) Vor ( S i ) = { p ∈ R 3 | λ ( S i , p ) ≤ λ ( S j , p ) } ∆(1) ∆(2) ⊲ Voronoi region – in all generality: – Neither connected : collection of faces – Nor simply connected ∆(0) ⊲ Dual complex: – Not a triangulation → abstract representation with a Hasse diagram – abstract edges without triangle ∆(7) Hole in Voronoi region ∆ 2 (2 , 5 , 6) ∆(5) ∆ 1 (1 , 2 , 4) Ex. ( Top ): ∆(1 , 3) ∆ 1 (2 , 5 , 6) ∆ 1 (4) ∆(2) ∆(6) ∆ 1 (2 , 3 , 4) – � = abstract triangles sharing two edges ∆ 1 (1 , 3 , 4) Lens sandwiched Voronoi region (Apollonius case) ∆(3) ∆ 2 (2 , 3 , 4) ∆ 2 (4) Ex. ( Top ): ∆ 1 (0 , 1 , 2) and ∆ 2 (0 , 1 , 2) ∆ 2 (1 , 2 , 4) ∆ 2 (1 , 3 , 4) ∆(1) – � = abstract triangles sharing the same edges Composed hole in Voronoi region Ex. ( Bottom ): ∆ 1 (1 , 4 , 5) and ∆ 2 (1 , 4 , 5)
Compoundly Weighted Filtration: the λ -complex ⊲ Definition. λ -complex K λ : ∆(7) – sub-complex of the dual complex ∆(5) ∆ 2 (2 , 5 , 6) – ∆ ∈ K λ : � S i ∈ ∆ R i ,λ � = ∅ ∆ 1 (2 , 5 , 6) ∆ 1 (4) ∆(2) → map λ to ∆ ∆(6) ∆ 1 (2 , 3 , 4) ⊲ Status of ∆ ∈ K λ and boundary ∂ S [ λ ]: ∆(3) ∆ 2 (2 , 3 , 4) – singular : � S i ∈ ∆ S i [ λ ] ∈ ∂ S [ λ ]. Ex. ∆ 1 , 3 ∆ 2 (4) – regular : � S i ∈ ∆ R i ,λ ∈ ∂ S [ λ ]. Ex. ∆ 3 , 4 ∆(1) – interior : � S i ∈ ∆ R i ,λ �∈ ∂ S [ λ ]. Ex. ∆ 2 , 3 ⊲ Classification of ∆( T k ): singular regular interior (1) ∆( T ) ∈ CH ( S ),Gabriel, non dominated/dominant ( ρ ∆( T ) , µ ∆( T ) ] ( µ ∆( T ) , + ∞ ] (2) ∆( T ) ∈ CH ( S ),non Gabriel, non dominated/dominant ( µ ∆( T ) , + ∞ ] (3) ∆( T ) �∈ CH ( S ) Gabriel, non dominated/dominant ( ρ ∆( T ) , µ ∆( T ) ] ( µ ∆( T ) , µ ∆( T ) ] ( µ ∆( T ) , + ∞ ] (4) ∆( T ) �∈ CH ( S ),non Gabriel, non dominated/dominant ( µ ∆( T ) , µ ∆( T ) ] ( µ ∆( T ) , + ∞ ] (5) ∆( T ) �∈ CH ( S ) Gabriel, dominant ( ρ ∆( T ) , µ ∆( T ) ] ( µ ∆( T ) , ρ ∆( T ) ] ( ρ ∆( T ) , + ∞ ] (6) ∆( T ) �∈ CH ( S ),non Gabriel, dominant ( µ ∆( T ) , ρ ∆( T ) ] ( ρ ∆( T ) , + ∞ ] (7) ∆( T ) �∈ CH ( S ) Gabriel, dominated ( ρ ∆( T ) , µ ∆( T ) ] ( µ ∆( T ) , γ ∆( T ) ] ( γ ∆( T ) , + ∞ ] (8) ∆( T ) �∈ CH ( S ),non Gabriel, dominated ( µ ∆( T ) , γ ∆( T ) ] ( γ ∆( T ) , + ∞ ]
Algorithms ⊲ Naively enumerating candidate tuples: – a tuple of toleranced balls: a pair, triple or quadruple – candidate : possibly contributing simplices 500000 Time (s) # candidate tuples 450000 # simplices 400000 ⊲ Computing the CW Dual Complex: 350000 – Iterative construction of the skeleton, 300000 250000 from tetrahedra to vertices 200000 150000 ⊲ Time complexity: O ( n ( n 2 + τ )) 100000 50000 τ : number of candidate tuples 0 0 100 200 300 400 500 600 700 800 900 1000 # toleranced balls ( Random Toleranced balls) ⊲ Difficulties: – comparing roots of degree four polynomial checking that extremal TT balls are conflict-free – computing the dual of non connected Voronoi region : disambiguating the neighborhood of dual simplices
Modeling Contacts in Macro-molecular Assemblies Introduction Voronoi Diagrams Compoundly Weighted Voronoi Diagrams and their λ -Complex Assessing the Reconstruction of Macro-Molecular Assemblies Probing assemblies With Graphical Models Conclusion and Perspectives
Multi-scale Analysis of Toleranced Models: Protein Contact History Encoded in the Hasse Diagram Skeleton graphs ( i ) ( ii ) i A i C i B p 3 [ λ ] p 3 [ λ ] p 3 p 1 λ p 3 λ = 1 p 1 [ λ ] p 1 [ λ ] λ C ∼ . 9 p 2 [ λ ] p 2 [ λ ] p 1 p 2 ( i C ) ( iii ) p 3 p 3 [ λ ] λ B ∼ . 4 ( i B ) p 1 p 2 λ A ∼ . 1 p 2 [ λ ] ( i A ) p 1 p 2 p 1 [ λ ] λ = 0 p 1 p 2 p 3 ⊲ Red-blue bicolor setting: red proteins are types singled out (e.g. TAP) ⊲ Protein contact history: Hasse diagram ⊲ Finite set of topologies: encoded into a Hasse diagram – Birth and death of a complex – Topological stability of a complex s ( c ) = λ d ( C ) − λ b ( C ) ⊲ Computation: via intersection of Voronoi restrictions
Voratom: Assessing Contacts in the Toleranced Model of a Large Assembly ⊲ 3 steps: – Building occupancy volumes – Building a Toleranced Model – Inferring the Hasse diagram encoding protein contacts VIDEO/voratom-y-complex-long.mpeg Nup120 Nup133 Nup84
Toleranced Models for the NPC ⊲ Input: 30 probability density maps from Sali et al. ⊲ Output: 456 toleranced proteins ⊲ Rationale: → assign protein instances to pronounced local maxima of the maps ⊲ Geometry of instances: – four canonical shapes i − r − – controlling r + i : w.r.t volume estimated from the sequence Sec13 Nup120 Nup84 Nup133 Nup84 Pom152 (i) Canonical shapes (ii) NPC at λ = 0 (iii) NPC at λ = 1
Stopping the Growth Process Matching the Uncertainties on the Input Data Volume of voxels with probability > 0 ⊲ Uncertainty of a density map: Stoichiometry × Reference volume Probability density maps sorted by molecular weight
Three Analysis of the Toleranced Model of an Assembly ⊲ Local: – Tracking copies of sub-complexes in the assembly → Hasse diagram ⊲ Global: – Inspecting pairwise protein contacts → Contact probabilities – Controlling the volume of evolving complexes → Volume ratio
Putative Models of Sub-complexes: the Y-complex ⊲ Symmetric core of the NPC ⊲ The Y-complex: pairwise contacts Pore membrane Nup 85 Pom52,Pom34,Ndc1 Seh 1 Nup 120 Sec 13 Channel nups Coat nups Nup133,Nup84,Nup145C Nsp1,Nup49,Nup57 Nup 145 C Sec13,Nup120,Nup85,Seh1 Nup 84 Nup 133 Adapter nups Nic96,Nup192,Nup188,Nup157,Nup170 ⊲ Ref: Blobel et al; Cell; 2007 ⊲ Ref: Blobel et al; Nature SMB; 2009 ⊲ Y-based head-to-tail ring vs. upward-downward pointing Cytoplasm Half-spoke Nucleus Spoke ⊲ Ref: Seo et al; PNAS; 2009 ⊲ Ref: Brohawn, Schwarz; Nature MSB; 2009 ⇒ Bridging the gap between both classes of models?
Assessment w.r.t. a Set of Protein Types: Isolated Copies Geometry, Topology, Biochemistry ⊲ Input: – Toleranced model – T : set of proteins types , the red proteins (types involved in a sub-complex) ⊲ Output, overall assembly: – number of isolated copies : symmetry analysis – their topological stability: death date - birth date (cf α -shape demo) ⊲ B : closure of the 2 rings; C : painting Nup133 in blue
Closure of the Two Rings Involving Y -complexes: Pairwise Contacts ⊲ The TOM supports Blobel’s hypothesis Events accounting for the closure λ = 0 . 66 9 (Nup133, Nup85) λ ∈ [0 . 09 , 0 . 70] 5 (Nup84, Nup85) λ ∈ [0 . 52 , 0 . 69] 1 (Nup133, Nup120) λ = 0 λ = 0 1 (Nup84, Nup120) λ = 0 . 06 Nup85 involved in 14 / 16 contacts ⊲ Inner structure of the Y-complexes into two sub-units Density maps: contour plot; Hasse diagram per sub-unit (Nup84, Nup145C, Nup133) (Nup120, Nup85, Seh1)
Three Analysis of the Toleranced Model of an Assembly ⊲ Local: – Tracking copies of sub-complexes in the assembly → Hasse diagram ⊲ Global: – Inspecting pairwise protein contacts → Contact probabilities – Controlling the volume of merging complexes → Volume ratio
Contact Frequencies versus Contact Probabilities: Definitions ⊲ Contact frequency f ij from Sali et al – Given N optimized bead models of the NPC: f ij : fraction of the N models with at least one contact ( P i , P j ) λ λ max = 1 ⊲ Contact probability p ( k ) p 1 1 , 3 ∼ 1 − 0 . 9 ij λ 1 ( P 1 , P 3 ) ∼ . 9 1 = 0 . 1 – Consider: the Hasse diagram for λ ∈ [0 , λ max ] a stoichiometry k ≥ 1 λ = 0 p 1 p 3 ( k high ) ( k drop ) – Define: λ k ( P i , P j ): smallest λ p = p ij ij ∃ k contacts between P i and P j – Contact proba. : p (1) = λ max − λ 1 ( P i , P j ) /λ max ( k drop ) δp ij ij k high = k drop = k low – Contact curve : p ( k ) as a function of k ij
Contact Frequencies versus Contact Probabilities: Results ⊲ Under-represented contact ⊲ Corresponding in Sali et al: contact curve: Nup 84 − Nup 60 : p (4) Nup 84 − Nup 60 : f ij = 0 . 07 = 1 ij ( k high ) ( k drop ) = p p ij ij ( k drop ) δp ij k high = k drop = k low ⊲ Over-represented contact ⊲ Corresponding in Sali et al: contact curve: Nup 192 − Pom 152 : p (1) Nup 192 − Pom 152 : f ij = 0 . 98 = 0 ij ( k high ) ( k drop ) p = p = 0 ij ij
Three Analysis of the Toleranced Model of an Assembly ⊲ Local: – Tracking copies of sub-complexes in the assembly → Hasse diagram ⊲ Global: – Inspecting pairwise protein contacts → Contact probabilities – Controlling the volume of merging complexes → Volume ratio
Assessment w.r.t. a Set of Protein Types: Volume Ratios ⊲ Definition: – Reference volume of a protein: volume estimated from its sequence of amino-acids a complex: sum of reference volumes of its constituting proteins ⊲ Output, per complex: – volume ratio : volume occupied vs. expected volume ⊲ Output, in conjunction with the Hasse diagram: – curve : evolution of volume ratio of evolving complexes Complexes in the Hasse diagram: variation of the volume ratio as a function of λ ⊲ Ref: Harpaz, Gerstein, Chothia; Structure; 1994
Modeling Contacts in Macro-molecular Assemblies Introduction Voronoi Diagrams Compoundly Weighted Voronoi Diagrams and their λ -Complex Assessing the Reconstruction of Macro-Molecular Assemblies Probing assemblies With Graphical Models Conclusion and Perspectives
Assessing a Toleranced Model with Respect to a High-resolution Structural Model Nup 85 Seh 1 Nup 120 Sec 13 Nup 145 C Nup 84 Nup 133 Assembly Complex: skeleton graph Template: skeleton graph Matching between a Complex and a Template: Protein instance ↔ Protein type Contact ↔ Contact Nup 85 Nup 85 Seh 1 Seh 1 Nup 120 Nup 120 Sec 13 Sec 13 Nup 145 C Nup 145 C Nup 84 Nup 84 Nup 133 4 extra edges 1 missing edge Nup 133 Exact superposition: Approximate superposition: Perfect Matching Alternate Matching
Assessment w.r.t. a High-resolution Structural Model: Contact Analysis ⊲ Input: two skeleton graphs – template G t , the red proteins : contacts within an atomic resolution model – complex G C : skeleton graph of a complex of a node of the Hasse diagram ⊲ Output: graph comparison, complex G C versus template G t : (common/missing/extra) × (proteins/contacts) ⊲ Graph theory problems: Perfect Matching: All Maximal Common Induced Sub-graphs (MCIS) Alternate Matching: All Maximal Common Edge Sub-graphs (MCES) Missing Protein Types Missing and Extra Contacts Perfect Matching G C G C p 1 G C p 1 p 1 p 2 p 2 p 2 ( p 1 , c 1 ) ( p 2 , c 2 ) p 4 p 4 p 4 ( p 1 , c 1 ) ( p 2 , c 2 ) p 3 p 3 p 3 A ( p 2 , c 2 ) A A G t | C c 1 G t | C c 4 c 4 G t | C A ′ c 1 ( p 4 , c 4 ) ( p 4 , c 4 ) c 1 c 2 ( p 3 , c 3 ) c 3 c 3 ( p 3 , c 3 ) ( p 4 , c 1 ) ( p 3 , c 2 ) c 2 c 2 ⊲ Ref: Cazals, Karande; Theoretical Computer Science; 349 (3), 2005 ⊲ Ref: Koch; Theoretical Computer Science; 250 (1-2), 2001
A New Template for the T -complex ⊲ T-complex and its skeletons ⊲ Putative positions Note the filaments wrt the inner ring of the NPC Nic 96 Nup 49 Nup 57 Nsp 1 T -core: (Nic96, Nsp1) T -leg: (Nup49, Nup57) ⊲ Perfect Matching: – G t ( T ): 0 matching with T -complex → Extra contacts (Nup49, Nsp1) Nic 96 Nic 96 Nic 96 – G t ( T comp ): 2 matching with T -complex → Missing contacts (Nup57, Nic96) Nsp 1 Nsp 1 Nsp 1 – G t ( T new ): 10 matching with T -complex Nup 57 Nup 49 Nup 57 Nup 57 Nup 49 Nup 49 G t ( T comp ) G t ( T new ) G t ( T ) → Best coherence with toleranced model ⊲ Contact analysis : asymmetric role of Nup49 and Nup57; new template
Modeling Contacts in Macro-molecular Assemblies Introduction Voronoi Diagrams Compoundly Weighted Voronoi Diagrams and their λ -Complex Assessing the Reconstruction of Macro-Molecular Assemblies Probing assemblies With Graphical Models Conclusion and Perspectives
Conclusion and Outlook ⊲ Compoundly Weighted Voronoi diagram – Geometric and topological analysis – Output sensitive algorithm – λ -complex and its computation ⊲ Toleranced models and their applications – Representing models with uncertainties – Bridging the gap global - fuzzy versus local - atomic resolution models ⊲ Reconstruction assessment – A panoply of tools to perform the assessment of large protein assembly models – . . . of interest in a virtuous loop reconstruction – assessment ⊲ Software – Algorithms to compute the CW diagram and the λ -complex (CGAL-style) – A generic C++ library for modeling and assessing large assemblies Nup 85 ∆(7) Seh 1 Skeleton graphs ∆(5) ∆ 2 (2 , 5 , 6) ( i ) ( ii ) Nup 120 i C i B ∆ 1 (1 , 2 , 4) i A p 3 [ λ ] p 3 [ λ ] ∆ 1 (2 , 5 , 6) ∆ 1 (4) ∆(2) p 3 ∆ 1 (2 , 3 , 4) ∆(6) p 1 λ ∆ 1 (1 , 3 , 4) λ = 1 p 3 ∆(3) p 1 [ λ ] p 1 [ λ ] ∆ 2 (2 , 3 , 4) p 2 [ λ ] λ C ∼ . 9 Sec 13 p 2 [ λ ] ∆ 2 (4) p 1 p 2 ( i C ) ∆ 2 (1 , 2 , 4) ∆ 2 (1 , 3 , 4) Nup 145 C ( iii ) ∆(1) p 3 [ λ ] p 3 Nup 84 λ B ∼ . 4 ( i B ) p 1 p 2 λ A ∼ . 1 p 2 [ λ ] ( i A ) p 1 p 2 Nup 133 p 1 [ λ ] λ = 0 4 extra edges 1 missing edge p 1 p 2 p 3
Perspectives ⊲ Compoundly Weighted Voronoi diagram – Study of homological features (Euler characteristic) – Faster computation (Incremental algorithm) ⊲ Toleranced models – Enhanced approximation of protein shapes – Interest of other non linear growth models (e.g Mobius) ⊲ Applications – Toleranced models in a different context (e.g, cryoEM or crystal structures) – Reconstruction by data integration and model selection
Toleranced Models for Large Assemblies: Positioning ⊲ Methodology: modeling with uncertainties – Toleranced models: continuum of shapes vs fixed shapes – Topological and geometric stability assessment (curved α -shapes) ⊲ Applications to toleranced complexes – Protein types (contact probabilities) – Protein complexes (morphology, contacts) Data processing Reconstruction • Stoichiometry determination • IMP http://team.inria.fr/abs • Connectivity inference • Bayesian approaches • Interface modeling • . . . • Approximating complex shapes • Mining density maps Experimental data • . . . • Mass spectrometry • TAP, Y2H, etc Fuzzy models • Collision X section • Cryo-EM • Qualitative results • High-res. structures • Not mechanistical • Immuno-EM • Assessment with TOM • dots – For Protein types – For Protein complexes • Model selection
References ◮ Modeling Macro-molecular Complexes : a Journey Across Scales, in Modeling in Computational Biology and Biomedicine: a Multi-disciplinary Endeavor , F. Cazals and P. Kornprost Editors, Springer, 2012. ◮ Multi-scale Geometric Modeling of Ambiguous Shapes with Toleranced Balls and Compoundly Weighted alpha-shapes, F. Cazals, Tom Dreyfus, Computer Graphics Forum (SGP) 2010 29(5): 1713–1722. ◮ Probing a Continuum of Macro-molecular Assembly Models with Graph Templates of Sub-complexes T. Dreyfus, and V. Doye, and F. Cazals Proteins: structure, function, and bioinformatics, 81 (11), 2013. ◮ Assessing the Reconstruction of Macro-molecular Assemblies with Toleranced Models T. Dreyfus, and V. Doye, and F. Cazals Proteins: structure, function, and bioinformatics, 80 (9), 2012. ◮ A note on the problem of reporting maximal cliques F. Cazals, and C. Karande Theoretical Computer Science, 407 (1–3), 2008.
Overview PART 1:Connectivity Inference from Native Mass Spectrometry Data PART 2:Building Coarse Grain Models PART 3:Handling uncertainties in Macro-molecular Assembly Models PART 4:Conformational Ensembles and Energy Landscapes: Analysis PART 5:Conformational Ensembles and Energy Landscapes: Comparison
Conformational Ensembles and Energy Landscapes: Analysis F. Cazals, A. Roth, T. Dreyfus C. Robert, IBPC / CNRS
Modeling Contacts in Macro-molecular Assemblies Landscapes: Intuitions Example Test System: BLN69 Landscapes: Multiscale Topographical Analysis
Analyzing Landscapes ⊲ Energy landscape ⊲ Density estimates E Cluster two Cluster one ◮ Input: point set + energies ◮ Input: point set ◮ Output: minima, saddles, ◮ Output: one cluster per attraction basins significant local maximum ⊲ Common points: ◮ Input consists of a set of points / conformations ◮ The elevation defines a landscape ◮ Neighbors used to define a graph / estimate a density
Landscapes and Peaks: What is a Peak !? ⊲ Key features in a landscape: lakes , peaks, passes – local minima, maxima, and saddles of the elevation function ⊲ Defining a peak . . . a matter of scales – prominence: closest distance to the nearest local maximum with higher elevation – culminance: elevation drop to the saddle leading to a higher local maximum ⊲ Some well known peaks have tame statistics: the Norden peak – fourth highest peak of the Mont Rose massif, 4609 meters – prominence: 575 meters; culminance: 94 meters ⊲ Ref: http://www.zermatt.ch/en/page.cfm/zermatt_matterhorn/4000er/nordend
Modeling Contacts in Macro-molecular Assemblies Landscapes: Intuitions Example Test System: BLN69 Landscapes: Multiscale Topographical Analysis
BLN69: a Simplified Protein Model ⊲ Description: – Three types of Beads: : hydrophobic(B), hydrophylic(L) and neutral(N) – Configuration space of intermediate dimension: 207 – Challenging: frustrated system – Exhaustively studied: DB of ∼ 450 k critical points N − 1 N − 2 N − 3 1 1 ( R i , i +1 − R e ) 2 + ( θ i − θ e ) 2 + ǫ · � � � V BLN = · K r [ A i (1 + cos φ i ) + B i (1 + 3 cos φ i )] K 0 2 2 i =1 i =1 i =1 N − 2 N σ σ ) 12 − D ij ( ) 6 ] � � +4 ǫ · C ij [( R i , j R i , j i =1 j = i +2 ⊲ Disconnectivity graph describing merge events between basins ⊲ Ref: Oakley, Wales, Johnston, J. Phys. Chem., 2011
Sampling the PEL using Numerical Methods The Example of Basin-Hoppinp ⊲ Basin-hopping and the basin hopping transform – Random walk in the space of local minima – Requires a move set and an acceptance test (cf Metropolis) and the ability to descend the gradient E C ⊲ Ref: Sch¨ on and Jansen, Prediction, determination and validation of phase diagrams via the global study of energy landscapes, Int’ J. of Materials Research, 2009
Landscape Exploration: Transition based Rapidly Growing Random Tree ( T-RRT ) ⊲ Algorithm growing a random tree favoring yet unexplored regions – node to be extended selection: Voronoi bias – node extension: interpolation + Metropolis criterion (+temperature tuning) C p r p r p e p n p n δ T ⊲ Ref: LaValle, Kuffner, IEEE ICRA 2000 ⊲ Ref: Jaillet, Corcho, P´ erez, Cort´ es, J. Comp. Chem, 2011
Modeling Contacts in Macro-molecular Assemblies Landscapes: Intuitions Example Test System: BLN69 Landscapes: Multiscale Topographical Analysis
Representing Sampled Landscapes ⊲ Ground space: conformational space ⊲ Elevation: potential energy / score ⊲ Nearest neighbor graph (NNG) – connect each sample to its k -nearest neighbors (l-RMSD) – faces the curse of dimensionality . . . yet, strategies to fudge around data structures to handle NN queries in metric spaces ⊲ Pseudo-gradient vector field: oriented NNG i.e. connect each sample to its highest neighbor E p i : σ (1) p j p k : σ (1) p l m 2 : σ (0) m 1 : σ (0)
Energy Landscape Analysis: Morse Sketching ⊲ Input: ◮ a collection of conformations { c i } ◮ or better: samples and the associated local minima. But . . . ◮ requires the gradient of the energy / score ◮ or derivative free optimization methods (CMA-ES) ⊲ Output: ◮ Transition graph connecting minima and saddles ◮ Basins associated with local minima ⊲ Method: ◮ Simulate a gradient descent from each point ◮ Identify ridges across basins, aka bifurcations
Critical Points and Stable Manifolds Illustrations for functions z = f ( x , y ) ⊲ Following the pseudo-gradient yields: ◮ Local minima ◮ Stable manifold of local minima: points flowing to local minima ◮ Index one saddles ⊲ Himmelblau ⊲ Rastrigin ⊲ Gauss6a (4,4,1) (121,220,100) (3,5,3)
Recommend
More recommend