bmi 206 structure structure comparisons sequence
play

BMI-206 Structure-Structure comparisons Sequence-Structure - PowerPoint PPT Presentation

BMI-206 Structure-Structure comparisons Sequence-Structure comparisons Marc A. Marti-Renom Assistant Adjunct Professor Department of Biopharmaceutical Sciences February 3rd, 2005 BMI206 01/19/2005 How to use this lectures Ask! Outline


  1. BMI-206 Structure-Structure comparisons Sequence-Structure comparisons Marc A. Marti-Renom Assistant Adjunct Professor Department of Biopharmaceutical Sciences February 3rd, 2005 BMI206 01/19/2005

  2. How to use this lectures Ask! Outline Basic introduction Theory (representation-scoring-optimization) Available programs Application Assignment The POM152 sequence. Modeling exercise. BMI206 01/19/2005

  3. Structure-Structure comparison Outline Before we start… Some theory Coverage .vs. Accuracy How can we compare structures… SALIGN (properties comparison) VAST (vector alignment) CE (local heuristic comparison) MAMMOTH (vector alignment) How we classify the structural space… SCOP (manual) CATH (semi-automatic) DBAli (fully automatic and comprehensive) BMI206 01/19/2005

  4. Structure-Structure alignments As any other bioinformatics problem… - Representation - Scoring - Optimizer BMI206 01/19/2005

  5. Representation Structures Ω i d i C α All atoms and coordinates Dihedral space or distance space Reduced atom representation v 3 v 2 v 1 Vector representation Secondary Structure Accessible surface (and others) BMI206 01/19/2005

  6. Scoring Raw scores ( ) 2 ∑ -x x RMSD = i Aminoacid substitutions Root Mean Square Deviation Ω i d i Secondary Structure (H,B,C) Accessible surface (B,A [%]) Angles or distances BMI206 01/19/2005

  7. Scoring Significance of an alignment (score) remember Patsy’s class Probability that the optimal alignment of two random sequences/structures of the same length and composition as the aligned sequences/structures have at least as good a score as the evaluated alignment. Sometimes Empirical approximated by Z-score (normal distribution). λ µ - ( - ) s = e P s ( ) Analytic ( ) = − λ µ - ( - ) s 1 exp e ≥ P s ( x ) Karlin and Altschul, 1990 PNAS 87 , pp2264 BMI206 01/19/2005

  8. Optimizer Global dynamic programming alignment remember Patsy’s class i 1 N Sq/St 1 Sq/St 2 1 M j ⎧ D Score + 1 2 3 … N ⎧ ( ) Score D i,j-1 Ä,rj ⎪ + ⎪ ( ) i,j-1 Ä,rj ⎪ Score D + ⎪ ⎪ * * * * * 1 2 3 … M ( ) Score D D D i-1,j-1 ri,rj =min =min + ⎨ ⎨ ( ) é é ,j ,j i-1,j-1 ri,rj * * * * * D Score ⎪ ⎪ + ( ) D i-1,j Score ri,Ä + ⎪ ⎪ ⎩ ( ) * * * i-1,j ri,Ä ⎪ 0 ⎩ * Best alignment score Backtracking to get the best alignment Needleman and Wunsch (1970) J. Mol Biol , 3 pp443 BMI206 01/19/2005

  9. Optimizer Global .vs. local alignment remember Patsy’s class Global alignment Local alignment BMI206 01/19/2005

  10. Optimizer Multiple alignment remember Patsy’s class Multiple alignments Pairwise alignments Following the tree from step 1 Example – 4 sequences A, B, C, D. B Align the most similar pair D B A A D Align next most similar pair B C C A D C Align B-D with A-C - similarity + B 6 pairwise comparisons D then cluster analysis A C New gap in A-C to optimize its alignment with B-D BMI206 01/19/2005

  11. Coverage .vs. Accuracy Same RMSD ~ 2.5Å Coverage ~90% C α Coverage ~75% C α BMI206 01/19/2005

  12. Sequence-Structure alignment by properties conservation (SALIGN-MODELLER) B A D B A C C D  Uses all available structural information  Provides the optimal alignment - similarity + Computationally expensive ( ) Ω i 2 ∑ -x x RMSD = i d i S R D B I i j , i j , i (3), (3) j i j , i j , Score w R w D w S w B w I w X = ∗ + ∗ + ∗ + ∗ + ∗ + ∗ i j , 1 i j , 2 i a ( ), ( ) j a 3 i j , 4 i j , 5 i j , 6 i j , Madhusudhan et al. in preparation BMI206 01/19/2005

  13. Structural alignment by properties conservation (SALIGN-MODELLER) http://alto.compbio.ucsf.edu/salign-cgi/index.cgi BMI206 01/19/2005

  14. Vector Alignment Search Tool (VAST) v 3 v 2 v 1 • Graph theory search of similar SSE • Refining by Monte Carlo at all atom resolution C α  Good scoring system with significance Reduces the protein representation ( ) 2 ∑ -x x RMSD = i C α Gibrat JF et al. (1996) Curr Opin Struct Biol 3 pp377 BMI206 01/19/2005

  15. Vector Alignment Search Tool (VAST) http://www.ncbi.nlm.nih.gov/Structure/VAST/vast.shtml BMI206 01/19/2005

  16. Incremental combinatorial extension (CE) • Exhaustive combination of fragments • Longest combination of AFPs C α • Heuristic similar to PSI-BLAST  FAST!  Good quality of local alignments Complicated scoring and heuristics d i 8 residues peptides ( ) 2 ∑ -x x RMSD = i Shindyalov IN, amd Bourne PE. (1998) Protein Eng. 9 pp739 BMI206 01/19/2005

  17. Incremental combinatorial extension (CE) http://cl.sdsc.edu/ce.html BMI206 01/19/2005

  18. Matching molecular models obtained from theory (MAMMOTH) v 3 v 2 v 1  VERY FAST!  Good scoring system with significance Reduces the protein representation 2.84 R URMS = − 2.0 n ( ) R AB URMS − URMS D S = R AB URMS Ortiz AR, (2002) Protein Sci. 11 pp2606 BMI206 01/19/2005

  19. Matching molecular models obtained from theory (MAMMOTH) http://fulcrum.physbio.mssm.edu:8083/ BMI206 01/19/2005

  20. Classification of the structural space SCOP classification Large Graph Layout Alex Adai Adai AT, Date SV, Wieland S, Marcotte EM. J Mol Biol. 2004 Jun 25;340(1):179-90 http://bioinformatics.icmb.utexas.edu/lgl/ BMI206 01/19/2005

  21. SCOP 1.65 database http://scop.mrc-lmb.cam.ac.uk/scop/  Largely recognized as “standard of gold”  Manually classification  Clear classification of structures in: CLASS FOLD SUPER-FAMILY FAMILY  Some large number of tools already available Manually classification Not 100% up-to-date Domain boundaries definition Class Number Number of Number of of folds superfamilies families All alpha proteins 179 299 480 All beta proteins 126 248 462 Alpha and beta proteins (a/b) 121 199 542 Alpha and beta proteins (a+b) 234 349 567 Multi-domain proteins 38 38 53 Membrane and cell surface 36 66 73 proteins Small proteins 66 95 150 Total 800 1294 2327 Murzin A. G.,el at. (1995). J. Mol. Biol . 247 , 536-540. BMI206 01/19/2005

  22. CATH 2.5.1 database http://www.biochem.ucl.ac.uk/bsm/cath/ Uses FSSP for superimposition  Recognized as “standard of gold”  Semi-automatic classification  Clear classification of structures in: CLASS ARCHITECTURE TOPOLOGY HOMOLOGOUS SUPERFAMILIES  Some large number of tools already available  Easy to navigate Semi-automatic classification Domain boundaries definition Orengo, C.A., et al. (1997) Structure . 5 . 1093-1108. BMI206 01/19/2005

  23. DBAli v2.0 database http://salilab.org/DBAli/ Uses MAMMOTH for superimposition  Fully-automatic  Data is kept up-to-date with PDB releases  Tools for “on the fly” classification of families.  Easy to navigate  Provides some tools for structure comparison Does not provide (yet) a stable classification Last updated: January 25th, 2005 Number of chains in database: 60,656 Number of structure-structure comparisons: 650,783,375 Marti-Renom et al. 2001. Bioinformatics . 17 , 746 BMI206 01/19/2005

  24. Classification of the structural space Not an easy task! Domain definition AND domain classification SCOP CATH DALI Same Class Same Domain Day, et al. (2003) Protein Sciences , 12 pp2150 BMI206 01/19/2005

  25. BMI206 01/19/2005

  26. Sequence-Structure comparison Outline Before we start… Some theory… Domain boundaries Structural predictions from sequence… SALIGN (gap penalties and substitution matrices) mGenThreader (SSE prediction and alignment/potential scores) Fugue (gap penalties and substitution matrices) 3D-Jury (as a meta server example) BMI206 01/19/2005

  27. General overview (Threading) Matches sequences to 3D structures Requires a scoring function to asses the fit of a sequence to a given fold Scoring functions deried from known structures and include atom contact and solvation terms evaluated in a pairwise fashion May include secondary structure terms, multiple alignments… Threading servers available using several different approaches Fold recognition server at Imperial College, UK http://www.sbg.bio.ic.ac.uk/~3dpssm/ PredictProtein server at EMBL http://www.embl-heidelberg.de/predictprotein/predictprotein.html Protein sequence-structure threading at NCBI http://www.ncbi.nlm.nih.gov/Structure/RESEARCH/threading.shtml BMI206 01/19/2005

  28. Template comparison methods Uses 3D “templates” for searching structural databases active site or binding site templates generated to reflect functionally important structural signatures Available software/servers Template Search and Superposition (TESS), Thornton Group http://www.biochem.ucl.ac.uk/bsm/PROCAT/PROCAT.html Wallace AC; Borkakoti N; Thornton JM. (1997) Protein Science 6 pp2308 “Fuzzy Functional Forms” , Skolnick - commercial availability Fetrow, Js and Skolnick, J (1998) J. Mo. Biol 281 pp949 Spatial Arrangements of Side-chain and Main-chain (SPASM), Kleywegt, Univ. of Uppsala http://portray.bmc.uu.se/cgi-bin/dennis/spasm.pl Kleywegt GJ (1999). J. Mol. Biol. 285 pp1887 BMI206 01/19/2005

Recommend


More recommend