Kristina Djinovi ć -Carugo Advanced use of databases in the hybrid structural research: PDB Department of Structural and Computational Biology Max F. Perutz Laboratories University of Vienna Austria EMBO Global WS: Structural and biophysical methods for biological macromolecules in solution Singapore, 12 th December 2017
Structural databases • PDB • SASDB • EM
Statistics by method
Pathway for generation of 3D structure
Pathway o Experiments : Crystallisation, X-ray diffraction o Computational : Structure determination, refinement, analysis
Why a crystal • X-ray scattering from a single molecule would be very weak and could not be detected above the noise level • A crystal arranges large numbers of molecules in the same orientation • Scattered waves can add up in phase and raise the signal to a measurable level • Crystal acts as an amplifier
Why crystal • The waves add up in phase in some directions and have to cancel out in other directions • X-rays diffracted from a crystal and detected on a flat 2D-detector
Image of molecule: electron density distribution • Electromagnetic radiation interacts with matter through its fluctuating electric field • The result of an X-ray crystallographic experiment is the distribution of electrons in the molecule
Image of molecule – electron density
Final result of structure determination (is none of what you see)
Final result: atomic 3D coordinates
Crystallographic terms
Reflection - Intensity • Intensities of diffracted beams are measured • Reflections, I – intensities " 𝐽 = |F o | - structure factor amplitudes •
Resolution • RES = ½ [ λ /sin( θ )] • Detail that can be resolved in electron density maps • C α -C β : 1.54 Å
Resolution of electron density map and consequently of the 3D model 1.0 Å 2.5 Å 3.0 Å 4.0 Å
Resolution of electron density map • 1 Å resolution individual atoms can be fitted
Resolution of electron density map • Alpha-helices are clear at 6 Å resolution, but beta-strands are not. • At lower resolutions than about 8 Å, only whole molecules can be placed.
RES in PDB
R factor å - Fobs Fcalc h h = R h å Fobs h h • |F o | - from measured defecation intensity • Experimental data • |F c | - calculated from coordinates • Global measure of agreement between experiment and the model • Surface residues can be less recognizable
R free å - Fobs Fcalc h h = R h å Fobs h h • Calculated for 5-10% of reflections not used in refinement • How well the model agrees with the data that it has not been fit to
R/R free • For most structures refined at RES=2.5 Å, R is less than 0.2 and R free less than 0.25
R free in PDB
RES and R factor for ranking • Ranking of quality of structures 1 Q = RES − R • Higher RES and lower R are associated with higher Q
Thermal motion • B iso = isotropic thermal factor • B iso = 8 π 2 <u> 2 • u = mean amplitude of displacement from the mean position • B iso = 0 Å 2 at T = 0 K
Thermal motion B iso B aniso 3 coordinates + 1B 3 coordinates + 6B
Thermal motion
Atomic displacement parameters (B) • B main < B side • B absorb lattice defects, large scale movements, disorder
Atomic displacement parameters (B) • B absorb lattice defects, large scale movements, disorder • I nform on function • Access to internal cavities, substrate channels • Correlation between thermal stability and thermal motions/flexibility
T of experiment • X-tal structures @ T = 100 K • NMR @ RT
T of experiment • 160K – 200K proteins undergo a phase transition • Conformational disorder goes from dynamic to static • Lower T reduces conformational distribution of sidechains: • à smaller and more packed and unique modes
Flowchart/Crystallographic terms Protein Solution Heavy Atom Derivative Crystals a, b, c, a , b , g , symmetry, |F o | h, k, l, |F P | h, k, l, |F PH | solvent content, # mol./a.u. a P Electron density r (x,y,z) h, k, l, |F calc |, a calc Structure (x i ,y i ,z i ,B i ) R-factor Publication, …
PDB statistics – depositions per year and cumulative growth
Folds
Redundancy of PDB • 135787 Biological Macromolecular Structures • 12.12.2017
Why redundancy… • Cover a limited space of biological macromolecular universe • Same or similar proteins, e.g. • Lysozyme > 500 entries • Membrane proteins : 2-3% PDB entries • 15% - 35% of human proteome • Intrinsically disordered proteins
Non redundant PDB subsets • PDBselect : reject proteins with aa sequence identity > threshold • http://bioinfo.tg.fh-giessen.de/pdbselect/ • PISCES : download precompiled datasets • dunbrack.fccc.edu/PISCES.php • “ Advanced search utility ” in PDB • Skip-Redundant of EMBOSS • Cd-hit • http://weizhongli-lab.org/cd-hit/
… but sequence is not all • Same sequence can adopt 2 different structures • e.g.: Calmodulin • Procedure taking in account also topology • Bioinformatics (2008), 24, 2632
CHECK FIGURES OF MERIT, STEREOCHEMISTRY
Missing residues • Interpretation of electron density ( ρ ) allows positioning of atoms • Sometimes ρ is elusive and thus positioning of atoms uncertain or impossible
Treatments of invisible residues/atoms • Omit the atoms/aa residues • Amino acid residues ‘torsos’ • … molecular graphics not always warns
Example “Torso”
Treatments of invisible residues/atoms • Leave the atom in the model à large B factors • Easily visualized by molecular graphics
Treatments of invisible residues/atoms • Leave the atom in the model with occupancy 0 • No alert by molecular graphics
Occurrence of invisible residues/atoms • 20% of structures at atomic RES contain invisible residues • 80% of structures 1.5 Å RES contain 80% of invisible residues • At atomic RES 2-3% residues are invisible • At 2.0 Å RES 7% residues are invisible • At 3.0 Å RES 10% residues are invisible
Reasons for invisible residues/atoms • Proteolysis Ltd • Quality/quantity of diffraction data • Conformational disorder
…caveat • Invisible residues are often on surfaces • Caution if surface properties are investigated: • Electrostatic potential K43
…caveat • Invisible residues are often on surfaces • Caution if surface properties are investigated: • Electrostatic potential calculation is affected! “Torso”
Conformational disorder / Occupancy • Residues/atoms do not reside on the same position in all residue/atoms in all molecules in the crystal/ensemble • à weak electron density à invisible • At medium/high RES observe multiple conformations • Static disorder • Dynamic disorder
Static/dynamic disorder • Static : two or more conformations exist • Dynamic (at higher T): shuffling from one conformation to another • Crystal structure determination gives time and space averaged structural information • à cannot distinguish between static and dynamic disorder
Alternative conformations The sum of occupancies of both positions conformations is 1
Disorder continued… • Occupancy of atoms/residues is < 1 • Ligand is not bound to a fraction of molecules in the crystal • Weak binding, suboptimal binding conditions • Misplaced ligand • Partial disorder • X-ray induced radiation damage • loss of carboxylates, methyl groups…
Partial disorder of the ligand • Weichenberger et al. • Volume 73 | Part 3 | March 2017 | Pages 211–222 | 10.1107/S205979831601620X
Radiation damage at work • Garman • Volume 66 | Part 4 | April 2010 | Pages 339–351 | 10.1107/S0907444910008656
…caveat • Molecular graphics shows all alternative conformations • Surface properties calculation with all conformations!
Stereochemistry • Is the protein stereo-chemically sound? • Covalent distances, angles, torsion angles, backbone conformation, group planarity, chirality, H-bonds, electrostatic interactions
Ramachandran plot
Ideal stereochemical parameters Peptide bond Average Single Bond Average Cα - C 1.53 (Å) C - C 1.54 (Å) C - N 1.33 (Å) C - N 1.48 (Å) N - Cα 1.46 (Å) C - O 1.43 (Å) Hydrogen Bond Average ( ± 0.3) O-H --- O-H 2.8 (Å) N-H --- O=C 2.9 (Å) O-H --- O=C 2.8 (Å) Rms deviations from ideal geometry: bond length 0.01 - 0.02 Å bond angles: 1.2 - 1.5 deg
Tools to check stereochemistry • Procheck • What_Check • MolProbity • ProSA • PDB – validation protocol, which examines also fit of experimental data
PDB – validation report
READ the REPORT 15% of the PDB files of similar 35% of the PDB files resolution are worse than this than this one (and one (and 85% are better). 65% are better). BETTER WORSE 0 10 20 30 40 50 60 70 80 90 100
Re-refined structures • Database of automatically re-refined structures
Carugo O, & Djinovic Carugo, K. Methods Mol Biol 2016 Criteria to extract high quality protein databank subsets from PDB And references therein
Recommend
More recommend