outline day 1 2
play

Outline Day 1 & 2 Introduction: The protein structure knowledge - PDF document

Swiss Institute of Bioinformatics SIB Doctoral School in Bioinformatics Advanced Course Protein Structure: Prediction and Analysis Day 2: Protein Structure Modeling Lausanne, September 1-5, 2008 Torsten Schw ede Biozentrum - Universitt


  1. Swiss Institute of Bioinformatics SIB Doctoral School in Bioinformatics Advanced Course Protein Structure: Prediction and Analysis Day 2: Protein Structure Modeling Lausanne, September 1-5, 2008 Torsten Schw ede Biozentrum - Universität Basel Swiss Institute of Bioinformatics Klingelbergstr 50-70 CH - 4056 Basel, Switzerland Tel: + 41-61 267 15 81 Outline Day 1 & 2 Introduction: The protein structure knowledge gap � Recap: Basic principles of proteins and their 3-dimensional structures � � Protein Structure modeling and prediction Comparative protein structure modeling � What happened to fold recognition? � De novo prediction � � Evaluation and Assessment of Protein Structure Model Quality Practicals & Tutorials: Tutorial: Structure Visualization with DeepView (Nicolas Guex, SIB Lausanne) � Practical: Examples of comparative modeling and model evaluation � Exam / credit points: � Student presentations on one of the evaluation examples

  2. The number of distinct protein domains in nature is limited. � Chothia (1992) Proteins. One thousand families for the molecular biologist. Nature. 3 5 7 : 543-4. � Idea: determine structures of representative proteins and then derive most other structures by homology modeling. � Structural Genomics (Protein Structure Initiative, Riken in Japan, SPINE in Europe) Fold Classification Fold Classification � Fold classification is an important to systematically study protein structure evolution � Multi-domain proteins have to be divided into domains prior to classification � There is no consensus on how to delineate the domains. � Three main protein structure classification databases are commonly used: � SCOP: manual classification based on evolutionary information � CATH: semi-automatic classification based on geometric criteria � FSSP: automatic classification based on direct structural similarity

  3. Fold Classification Databases � The CATH database is a hierarchical domain classification of protein structures in the Brookhaven protein databank. � UCL, Janet Thornton & Christine Orengo � clusters proteins semi-automated at four major levels: � Class(C) � Architecture(A) � Topology(T) � Homologous superfamily (H) [ http:/ / w w w .biochem .ucl.ac.uk/ bsm / cath_ new / ] Protein Structure Classification � Class( C) derived from secondary structure content is assigned automatically � Architecture( A) describes the gross orientation of secondary structures, independent of connectivity. � Topology( T) clusters structures according to their topological connections and numbers of secondary structures � Hom ologous Superfam ily ( H) This level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous. [ http: / / www.cathdb.info ]

  4. Fold Classification Databases � Top of the hierarchy: Example: 1EWF

  5. Example: 1EWF Fold Classification Databases � Structural Classification of Proteins: SCOP � MRC Cambridge (UK), Alexey Murzin, Brenner S. E., Hubbard T., Chothia C. � hierarchical classification of protein domain structures � created by manual inspection � comprehensive description of the structural and evolutionary relationships � organized as a hierarchical structure • Class • Fold • Superfamily • Family • Species [ http:/ / scop.m rc-lm b.cam .ac.uk/ scop/ ]

  6. Fold Classification Databases The different m ajor levels in the hierarchy are: � Fold : Major structural similarity Proteins are defined as having a common fold if they have the same major secondary structures in the same arrangement and with the same topological connections. � Superfam ily : Probable common evolutionary origin Proteins that have low sequence identities, but whose structural and functional features suggest that a common evolutionary origin is probable are placed together in superfamilies. � Fam ily : Clear evolutionarily relationship Proteins clustered together into families are clearly evolutionarily related. Generally, this means that pair wise residue identities between the proteins are 30% and greater. Fold Classification Databases

  7. Fold Classification Databases Fold Classification Databases

  8. qCOPS and Fold Space Navigator � Quantitative classification of protein structures, navigation through fold space and visualization of pairwise structure similarities. http: / / www.came.sbg.ac.at Protein Structure / Fold Databases PDB: http: / / www.pdb.org � � EBI-MSD http: / / www.ebi.ac.uk/ msd/ SCOP http: / / scop.mrc-lmb.cam.ac.uk/ scop/ � � CATH http: / / www.biochem.ucl.ac.uk/ bsm/ cath_new/ � FSSP: http: / / ekhidna.biocenter.helsinki.fi/ dali/ start

  9. Comparing Protein Structures � Why do we want to compare protein structures? � Classify structures � Identify structural movements (induced fit, NMR, etc.) � Analyze evolutionary relationships � Identify recurring structural motifs � Assess quality of predicted models � … Comparing Protein Structures What do we need to compare structures? � � Protein sequences can be treated as linear strings of letters . For a given similarity matrix, sequences can be aligned optimally using dynamic programming. Protein structures are 3-dimensional objects. We need � to find algorithms (analogue to DP for sequences) which find an optimal match for two shapes – given a certain similarity measure.

  10. Comparing Protein Structures � What do we need to compare structures? 1. Structural feature description 2. Comparison / superposition algorithms 3. Distance / similarity measure 1. Description A Description B 2. Similarity / Distance 3. Measure Comparing Protein Structures � Local or global comparison? � Global: n = 5 � Local: n = 4

  11. Comparing Protein Structures � Distance measure: Root mean square deviation � Comparing two sets of points (= atoms in structures) A = { a 1 … a n } and B = { b 1 … b n } with a i Position vector of atom i in structure A n Number of equivalent atoms � We need to define a 1: 1 correspondence for atoms in A and B � Root mean square distance is calculated from the squared Euclidian distances between corresponding points: n ∑ − 2 ( ) a b i i = = . . . . 0 i r m s d n Comparing Protein Structures Distances in Euclidian Space � For two points x = (x 1 ,x 2 ,x 3 ,… ,x n ) and y = (x 1 ,x 2 ,x 3 ,… ,x n ), a p- norm distance is defined as: n ∑ = − x y i i � 1-norm distance = 1 i 1 / 2 ⎛ 2 ⎟ ⎞ = ∑ n ⎜ − x y � 2-norm distance i i ⎝ ⎠ = 1 i 1 / p ⎛ ⎞ = ∑ n � p-norm distance ⎜ − p ⎟ x y i i ⎝ ⎠ = 1 i � In Euclidian space R n , distances are normally given as Euclidian distance (= 2-norm distance), which is a generalization of the Pythagorean theorem. � p need not be integer, but can not be less than 1.

  12. Comparing Protein Structures Distances � Mathematically, “distance” is a function which meets the following criteria: � One can find the distance between any two points. � That distance is a distinct real number. � It is positive definite. d( x , y ) ≥ 0, and d( x , y ) = 0 if and only if x = y . � It is symmetric. d( x , y ) = d( y , x ). � It satisfies the triangle inequality, d( x , z ) ≤ d( x , y ) + d( y , z ). � Such a function is known as a metric. � Geometrically, the right-hand part of the triangle inequality states that the sum of the lengths of any two sides of a triangle is greater than the length of the remaining side: Y Z X Comparing Protein Structures � Protein Structure Superposition � Assume that we know the correspondence set between A and B (e.g. NMR ensembles, induced fit) � Task: Find a rigid transformation RT which best superposes A = { a 1 … a n } and B = { b 1 … b n } � Many solutions in image analysis, mechanical engineering, … � Find a rigid transformation RT which minimizes the error function E: n ∑ 2 = − min ( ) E RT a b RT i i = 1 i

  13. Comparing Protein Structures � Protein Structure Superposition � The rigid transformation RT has a translational component T and a rotational component R: RT(a) = R(X) + T � The error function to be minimized becomes: n ∑ 2 = − + min ( ) E R a b T RT i i = 1 i Comparing Protein Structures � 1. The translational component � The error function is as its minimum with respect to the translation when: ∂ n E ( ) ∑ = − + = 2 ( ) 0 R a b T ∂ i i T = 1 i ⎛ ⎞ n n ∑ ∑ = − ⎜ ⎟ + T R a b i i ⎝ ⎠ = = 1 1 i i � If both structures A and B are centered on the coordinate origin, Σ a i and Σ b i become 0, and then also T = 0. � In the first step, we translate the centers of structures A and B to the origin of the coordinate system.

Recommend


More recommend