Toward automated structure determination from near-atomic resolution data Frank DiMaio University of Washington Institute for Protein Design November 2014
2 Accurate structure determination with RosettaEM Homology modelling de novo • Template identification model • Multi-model docking building Model extension All-atom refinement • B factor fitting • Cross validation
3 Accurate structure determination with RosettaEM Homology modelling de novo • Template identification model • Multi-model docking building Model extension All-atom refinement • B factor fitting • Cross validation
4 Lack of sidechain detail makes identifying sequence difficult Crystallographic “autotracing”: Backbone tracing Sequence registration 4.8Å reconstruction 20S proteasome (courtesy Yifan Cheng & Xueming Li)
5 Searching density for local backbone conformations Local sequence restricts local structure …CVKVTKPLVARAKL… 6-dimensional sidechain building search & refinement Ray Wang (in review)
6 Selecting a maximally consistent set of fragments Idea: The correct placements must all be consistent • adjacent fragments must assign the same residue to the same location • residues close in sequence must be close in space • no two residues can occupy the same space score ( F ) = f X X X X ) = sc dens ( f i ) + sc overlap ( f i , f j ) + sc close ( f i , f j ) + sc clash ( f i , f j ) f i ∈ F f i ,f j ∈ F f i ,f j ∈ F f i ,f j ∈ F
Monte Carlo sampling correctly identifies sequence Monte Carlo Sampling ! Density Map ! -3.65 64 Number of fragments assigned 62 Round 1 ! 60 Score (10 3 ) 58 -3.66 56 54 52 -3.67 50 Accuracy ! Partial Model ! Fragment Placement ! H1 S1 S2 S3 H2 H3 S5 S6 S7 S8 H4 H5 S9 S10 H6 45 25 10 RMSD 5 2.5 1 0.5
Multiple rounds of sampling completes model Density Map Partial Model Fragment Placement Monte Carlo -3.67 78 45 -3.78 Number of fragments assigned 25 76 Round 2 ! 74 Score (10 3 ) 10 RMSD 72 -3.80 5 70 2.5 68 1 -3.82 66 0.5 64 -4.64 96 45 Number of fragments assigned 25 94 Round 3 ! Score (10 3 ) 10 92 RMSD -4.65 5 90 2.5 1 88 -4.66 0.5 86 0 25 50 75 100 125 150 175 200 221 60 70 80 90 100 Residue ! Accuracy (%) !
20S proteasome α -subunit at 4.8 Å Overlay of the " fulllength model (red) " Density " Final Partial Model " to the native (blue) " 1.28 A 1.19 A 196/213 rsds
10 Automatic structure determination is accurate in 6 of 9 cases Partial Reported model C α C α RMSd PDB ID EMDB Length Target resolution RMSd (chain) ID (aa) [Å] (Å) [Å] (%) TMV 3j06 (A) 5185 3.3 155 1.3 ( 81 ) 1.7 TRPV1 3j5q (A) 5778 3.4 310 1.1 ( 76 ) 1.4 FrhA 4ci0 (A) 2513 3.4 385 2.3 ( 91 ) 1.3 FrhB 4ci0 (C) 2513 3.4 280 1.4 ( 85 ) 1.7 FrhG 4ci0 (B) 2513 3.4 228 1.6 ( 73 ) 2.2 BPP1 3j4u (A) 5764 3.5 327 17.2 (42) - VP6 1qhd (A) 1461 3.8 397 1.6 (52) - 20S- α 1pma (A) TBD 4.8 221 1.3 ( 88 ) 1.2 STIV 3j31 (A) 5584 3.9 344 21.9 (26) -
11 Automatic structure determination is accurate in 6 of 9 cases Overlay of the " fulllength model (red) " Density " Final Partial Model " to the native (blue) " TRPV1 " 3.4 Å " 1.26 A 1.43 A 74.9% (236/315 rsds) 3.4 Å " FrhB " 1.40 A 1.62 A 85.1% (239/281 rsds)
12 Crystallographic chain tracing is generally unable to register sequence Using Buccaneer: PDB ID Length (aa) C α atom Sequence Correctly Target (chain) placed registered registered TMV 3j06 (A) 155 145 56 0 TRPV1 3j5q (A) 315 257 190 0 FrhA 4ci0 (A) 386 382 367 185 ( 48% ) FrhB 4ci0 (C) 281 192 186 126 ( 45% ) FrhG 4ci0 (B) 228 242 190 63 ( 27% ) BPP1 3j4u (A) 327 339 162 0 VP6 1qhd (A) 397 405 155 0 20S- α 1pma (A) 221 224 135 7 ( 3% ) STIV 3j31 (A) 345 553 259 0
13 Failures are primarily in sheets Density Partial Model Native Rotavirus-vp6 3.8 Å 1.62 A 52.1% (207/397 rsds) 3.9 Å STIV 2.46 A 20.0% (69/345 rsds)
14 VipAB structure determination VipA: 168 residues VipB: 492 residues with Misha Kudryashev, Marek Basler, Ed Egelman ( in review )
VipAB structure determination 446/660 residues
16 Our method corrects errors from the manually traced model manual model Automated model
E dens manually traced model Our method corrects errors from the -4.5 -3.5 -2.5 -5.5 -6 -5 -4 -3 -2 GLU_88 VAL_91 ASN_94 SER_97 ASP_100 PRO_103 VAL_106 GLN_109 GLU_112 manual automatic LYS_115 GLU_118 GLU_121 E geom -0.5 -1.5 0.5 1.5 -2 -1 0 1 GLU_88 VAL_91 ASN_94 SER_97 ASP_100 PRO_103 VAL_106 GLN_109 GLU_112 LYS_115 GLU_118 GLU_121 17
18 Accurate structure determination with RosettaEM Homology modelling de novo • Template identification model • Multi-model docking building Model extension All-atom refinement • B factor fitting • Cross validation
19 Refinement against EM density • Refinement • identify (and correct) errors in the initial model • improve fit to data • improve model geometry
20 Refinement at low resolution requires a better geometry potential Refinement: find atom positions optimizing: E = E geom + w ⋅ E data E data E geom High-resolution E data E geom Low-resolution
21 Rosetta forcefield disambiguates low-resolution solutions Information from known structures reduces conformational space Core Hydrogen Electro- packing bonding statics Rotamer Torsional probabilities probabilities + tools for improved optimization ( discrete sidechain optimization, torsion and Cartesian space minimization, dynamics )
22 Our approach improves refinement against low-resolution crystallographic data 14 Number of structures start 12 phenix 10 DEN Refmac 8 Rosetta 6 4 2 0 0-1 1-2 2-3 3-4 4-5 5-6 6-7 RMS to deposited structure
23 Key components for refinement against cryoEM • Model validation • Independent map agreement over high-resolution shells • Variations in local resolution • Atomic B factors describing how spread the density is around each atom • Small radius of convergence • Discrete backbone optimization in refinement
24 Independent validation Refine models into Evaluate models against reconstruction 1 reconstruction 2 “train map” “test map”
25 Independent validation 10Å 6Å ! 1 Training versus test Training versus w=0.1 model 0.8 Testing versus w=0.1 model Fourier Shell correlation Training versus w=20 model Fourier Shell Correlation Testing versus w=20 model 0.6 0.4 0.2 0 -0.2 0 0.05 0.1 0.15 0.2 0.25 0.3 S (1/Å) 1/resolution
26 Independent validation 0.70 16 training map 14 testing map 0.65 12 energy Rosetta energy (x10 5 ) FSC correlation (12-6Å ) 10 0.60 8 6 0.55 4 2 0.50 0 -2 0.45 -4 -3 -2 -1 0 1 2 Density weight log( w a )
27 Fitting atomic B factors • In addition to refining atomic coords, refine per-atom B factors (in real space) • Alternate coordinate refinement and B factor refinement • Constraint function keeps B factors of nearby atoms close
28 Model B’s have good agreement with crystallographic Bs Deposited crystal structure CryoEM map, real-space B factors (1pma)
29 Iterative density-guided conformational sampling find allowed backbone conformations optimize into density with minimal forcefield
30 Assessing the role of starting-model quality on structure determination 20S proteasome at 3.3Å resolution Template Sequence ID 1yar 100% 3h4p 50% 3nzj 32% 1iru 30% input MDFF Rosetta 1.00 1ryp 30% Fraction residues 0.80 1q5q 26% 3unf 25% within 1A 0.60 1m4y 20% 0.40 2x3b/2z3b 19% 4hnz 17% 0.20 1g3k 17% 0.00 1g0u 17% Starting model (sorted by difficulty) with Yifan Cheng, Xueming Li
31 We can accurately determine structures to atomic resolution at 4.4Å or better input MDFF Rosetta 1.0 0.8 0.6 4.1Å 0.4 5k particles Fraction residues within 1Å 0.2 0.0 1.0 0.8 4.4Å 0.6 3k particles 0.4 0.2 0.0 1.0 0.8 6.0Å 0.6 1k particles 0.4 0.2 0.0 Starting model (sorted by difficulty)
32 Model convergence is an indicator of accuracy 1gou (6.0Å) 1gou (3.3Å)
33 Independent FSC is an indicator of accuracy (though not absolute) FSC Fraction of residues within 1 Å
34 Model strain also can indicate errors Angle violations (energy units) Residue
35 Refinement of TRPV1: Deposited structure – No violations – Bond lengths – Bond angles – Dihedral_angles – Sidechain rotamer outliers – C β deviations – Ramachandran angles
36 Local strain reveals errors – No violations – Bond lengths – Bond angles – Dihedral_angles – Sidechain rotamer outliers – C β deviations – Ramachandran angles
37 Local strain reveals errors – No violations – Bond lengths – Bond angles – Dihedral_angles – Sidechain rotamer outliers – C β deviations – Ramachandran angles
38 Final refined model – No violations – Bond lengths – Bond angles – Dihedral_angles – Sidechain rotamer outliers – C β deviations – Ramachandran angles
Recommend
More recommend